Chances are, you’ve likely heard of the term garbage collection. It would not be uncommon, however, if you also did not understand the concept behind it. Most modern programming languages such as Java, C#, or JavaScript utilize garbage collection. In fact, garbage collection is so efficient at what it does that, as a programmer using one of these languages, we never have to give much conscious thought to it. But what exactly is garbage collection?
To understand garbage collection, it’s pertinent to first understand the two areas of memory your program has access to – the stack and heap. At a very high level, the stack is memory that is typically managed for you. That is, when you declare a variable within a function, the memory for that variable is allocated from the stack. When the function returns, or the variable goes out of scope, the memory is automatically freed. Stack memory is allocated for variables whose sizes are known at compile time, and is usually very limited in size—with Visual Studio reportedly only allocating 1MB to stack memory. Stack overflow (not the site we visit hundreds of times per week!) is when the stack memory limit is exceeded. Most often, the operating system will close the offending program when this happens.
Heap, on the other hand, is a much larger pool of memory managed by the operating system and shared among applications and other processes. On modern computers, the heap can be gigabytes in size. The heap is where dynamically allocated memory is requested from. Dynamically allocated memory is typically required when dealing with user or file input, or when you need to allocate a very large object. This makes sense: we don’t know how many records a file will contain, therefore the memory required for these variables cannot be defined at compile time.
In languages like those mentioned previously, dynamic memory allocation is abstracted away and is of no concern to the programmer. In JavaScript, for example, you don’t need to know the size of an array when it is declared. However when dynamically allocating memory in a language that does not utilize garbage collection, the programmer also becomes responsible for freeing the memory once it is no longer used. Failure to free memory can cause another issue we are familiar with—memory leaks. Typically, once the application is closed, the operating system is capable of cleaning up these memory leaks. However, consider the case of a video game, an application which may be open for several hours. These memory leaks not only affect the offending application, but they also reduce the memory available to other programs and processes. After several hours of creating memory leaks, your computer may start running slowly or even crash. Why then, would someone opt to use a language that requires manual memory management?
The garbage collector works by intermittently “scanning” all object reference trees. Any object that is no longer referenced and not reachable by application code is marked by the garbage collector to be removed, reclaiming the unused memory. This intermittent scanning has the potential to pause your application in order to collect and remove the unused memory. For most applications, this is negligible. There are, however, several areas where this increased performance is important. Many video games are written in languages like C++, where garbage collection kicking in at the wrong time can impact frame rates. Real time systems like finance benefit from the performance increase by being able to instantaneously create and process transactions, where even a slight delay could be problematic or detrimental.
“Modern” languages were created to allow developers the freedom to iterate quickly, without worrying about such low level concepts such as memory management. As we’ve seen though, the topic is still relevant today, and is an interesting one to explore.