Source: Canva Team

Garbage, Garbage Go Away

What, Why & How of Java’s Garbage Collection

Ananya
7 min readJan 26, 2021

--

Any application needs memory to run. However, computer memory has limited space. So, in order to run applications, it is important that the memory is cleaned once in a while to make space for new data and remove the old unused data.

Who does this cleaning? How & when is the memory cleaned? In fact, how does the memory looks like? Let's dive into details.

Java Memory Model

The Java Memory Model constitutes the following components:

Java Memory Model
Java Memory Model

Native Memory: The available system memory is the Native Memory.

Heap Memory: A part of the Native memory is allocated by JVM to Heap memory. The JVM uses this heap memory to store Objects. This is shared across all threads of the application. The size of this memory is configured using the -Xms (Minimum Heap Size) and -Xmx (Maximum Heap Size) JVM settings.

Stack Memory: The stack memory is used to store local variables and function call stacks. Each thread has a separate thread stack.

Metaspace: This memory stores the class metadata and static variables. This space is also shared. Since metaspace is part of Native memory, its size is platform-dependent. The upper limit of metaspace can be configured using the “MaxMetaspaceSize” flag

PermGen: PermGen (Permanent Generation) was part of the Java memory model till Java 7. This is replaced by the Metaspace starting Java 8.

CodeCache: The JIT compiler compiles the frequently executed code and converts it to native machine code and stores it in Code Cache for faster execution. This is also part of Native memory.

Garbage Collection: Introduction

This section answers what garbage is and what is garbage collection.

What is Garbage? : An object which can no longer be reached from any other live reference object in the running application is considered Garbage. Such objects, as they are no longer used in the application, can be removed from the memory.

For example: In the below diagram, the “fruit2” object is eligible for garbage collection as no live reference for it exists.

Garbage

What is Garbage Collection?: Garbage collection is the process of automatic memory management. The task of deallocating/freeing up memory (by cleaning garbage) is carried out automatically by Garbage Collection. As a programmer, we don’t need to intervene into the garbage collection process.

Who does Garbage Collection? The garbage collector (GC) is a component in JVM’s Execution Engine that looks after the garbage collection of Java Objects. Garbage Collector only runs in the Heap area of the Java Memory Model.

Source: Oracle.com

Garbage Collection: Process

This section lays out how garbage collection works in Java.

Mark & Sweep Process: The garbage collection is carried out using the mark and sweep process. This process has the following 3 steps:

  1. Marking: In the first step, the GC scans all the objects to mark the live objects (The objects which are still in use). The program execution is paused for this step. Hence this step is also called Stop the World Event Marking.
  2. Sweeping: In this step, the memory allocated by Objects which are not marked in the previous step is freed up.
  3. Compaction: The objects which survive sweeping are moved into a single contiguous block of memory. This prevents the heap from becoming fragmented over time and allows easier & faster allocation of new objects.
Mark & Sweep Garbage Collection

Generational Garbage Collection

This section describes the optimized version of the mark and sweep process: Generational Garbage Collection.

What are JVM Generations?

For the purpose of Generational Garbage Collection, the heap memory is further divided into the following 4 sections also known as generations. Objects are placed in different generations based on their age (how long they have been in use in the application).

  1. Young Generation: New objects are created in the young generation. The young generation is divided into 3 sections: Eden, S0, and S1 (Survivor Spaces)
  2. Old Generation: The older objects (which have survived in the application for a long) are stored in the old generation.
Generations of Heap

What is Stop the World Event?

When the marking phase runs, the application threads are stopped from executing traffic. The application resumes serving traffic when marking is completed. Any garbage collection is a “stop the world event”.

What is Generational Collection?

As mentioned earlier, generational garbage collection is an optimization of the mark and sweep process. It is based on 3 primary hypotheses:

  1. Most objects don’t live for long.
  2. If an object survives, then it is likely to live forever.
  3. Mark & sweep process takes less time to run when there is a lot of garbage i.e. the marking will be quicker if the area to be analyzed is small and it comprises mostly dead objects.

Based on the above hypothesis, the following steps are followed in generational garbage collection:

Generational Garbage Collection
  1. New objects are created in Eden Space of the Young Generation. Survivor spaces are empty at this point.
  2. When Eden space fills up, Minor Gargabe Collection takes place. Minor Garbage Collection is the process in which the mark and sweep process runs in the young generation.
  3. As a result of Minor GC, the live objects are moved to one of the survivor spaces (let’s say S0). Dead objects are permanently removed from young generation.
  4. Once again, as the application runs, the Eden space starts filling up with new objects. In the next Minor Garbage collection cycle, the young generation and S0 are cleared out of dead objects. The surviving objects are moved to S1 this time and their age is incremented (to mark that they have survived one garbage collection).
  5. At the next minor GC, the same process repeats. However this time the survivor spaces switch. Alive objects are moved to S0 and they are aged. Eden and S1 are cleared.
  6. Objects are copied between survivor spaces in this way until they’ve been copied a certain number of times (i.e. they have survived a certain number of Minor GC cycles) or there isn’t enough space left there. These objects are then copied into the Old region. This process is called Aging.
  7. Major Garbage Collection: In this step, the mark and sweep process is executed in the old generation to clear out dead objects from there. Major GC is slower compared to Minor GC as the old generation mostly comprises live objects (Hence marking takes time).
Generational Garbage Collection in Action

Advantages of Generational Collection:

The Minor GC takes place in a smaller area of the heap (~ 2/3 of the total heap). The marking step is efficient as the area is small and comprises mostly of dead objects. (Hypothesis 1 & 3)

Disadvantages of Generational Collection:

At any point in time, one of the survivor spaces (S0 or S1) is empty.

Garbage Collection: Flags

This section highlights some of the important flags that you can use to tune the garbage collection process.

JVM Flags

Type of Java Garbage Collectors

This section layouts the different types of garbage collectors available in Java.

Java Garbage Collectors

GC Monitoring Tools

What should we monitor?

  1. How frequently garbage collection is running: As garbage collection is a “stop the world event”, hence, it is desirable to keep the GC time as low as possible.
  2. How much time it takes for one garbage collection cycle to run

How to monitor GC?

Following are the tools for monitoring Garbage collection:

  1. Visual VM (https://visualvm.github.io/)
  2. To Enable GC logging in an application, add below JVM parameter
-XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -Xloggc:/tmp/[Application-Name]-[Application-port]-%t-gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 -XX:GCLogFileSize=100M

The output of GC logs is as follows:

[15,651s][info ][gc] GC(36) Pause Young (G1 Evacuation Pause) 239M->57M(307M) (15,646s, 15,651s) 5,048msExplanation:[Time from when the application was started = 15,651s][Log level=info][Tag=gc][GC identification number = 36] [Type of GC= Pause Young] [Cause for stating the GC= G1 Evacuation Pause] [Memory Consumption Information : "used before GC" = 239M -> "used after GC" = 57M (Heap Size=307M)] [Start and End times for the GC = 15,646s, 15,651s] [Total GC Duration= 5,048ms]

You can analyze the GC logs collected above using gcEasy.io

--

--

Ananya

Software Developer | Technical Writer | Technology Enthusiast