Memory Management Under the Hood C# Help

One of the advantages of C# programming is that the programmer does not need to worry about detailed memory management; in particular, the garbage collector deals with the problem’ of memory cleanup on your-behalf. The result is that you get something that approximates the efficiency of languages like C++ without the complexity of having to handle memory management yourself as you do in C++. However, although you do not have to manage memory manually, it still pays to understand what is going on behind the scenes. This section looks at what happens in the computer’s memory when you allocate variables .

The precise details of much of the content of this section are undocumented. You should interpret this section as a simplified guide to the general processes rather than as a statement of exact implementation.

Value Data Types

Windows uses a system known as virtual addressing, in which the mapping from the memory address seen by your program to the actual location in hardware memory is entirely managed by Windows, The result of this is that’ each process on a 32-bit processor sees 4GB of available memory, regardless of how much hardware memory you actually have in your computer (on 64-bit processors this number will be greater). This 4GB of memory contains everything that is part of the program, including the executable. code, any DLLs loaded by the code, and the contents of all variables used when the program runs. This 4GB of memory is known as the virtual address space or virtual memory. For convenience, in this chapter, we call it simply memory.

Each memory location in the available 4GB is numbered starting from zero. To access a value stored at a r’articular location in memory, you need to supply the number that represents that memory location. In any compiled high-level language, including C#, Visual Basic, C++, and Java, the compiler converts human-readable variable names into memory addresses that the processor understands. Somewhere inside a processor’s virtual memory is an area known as the stack. The stack stores value data types that are not members of objects. In addition, when you call a method, the stack is used to hold a copy of any parameters passed to the method. To understand how the stack works, you need to understand the importance of variable scope in C#. It is always the case that if a variable a goes into scope before variable b, then b will go out of scope first. Look at this code:

Capture

First, a gets declared. Then, inside the inner code block, b gets declared. Then the inner code block terminates and b goes out of scope, then a goes out of scope. So, the lifetime of b is entirely contained within the lifetime of a. The idea that you always deallocate variables in the reverse order to how you allocate them is crucial to the way the stack works.

You do not know exactly where in the address space the stack is – you don’t need to know for C# development. A stack pointer (a variable maintained by the operating system) identifies the next free
location on the stack. When your program first starts running, the stack pointer will point to just past the end of the block of memory that is reserved for the stack. The stack actually fills downward, from high memory addresses to low addresses. As data is put on the stack, the stack pointer is adjusted accordingly, so it always points to just past the next free location. This is stack pointer with a value of 800000 (OxC3500 in hex); the next free location is the address 799999.

Figure 12-1

The following code instructs the compiler that you need space in memory to store an integer and a double, and these memory locations are referred to as nRacingCars and engineSize. The line that declares each variable indicates the point at which you will start requiring access to this variable. The closing curly brace of the block in which the variables are declared identifies the point at which both variables go out of scope.

{

int nRacingCars 10;
double engineSize 3000.0;
// do calculations;

}

when the variable nRacingCars comes into scope and is assigned the value 10, the value 10 is placed in locations 799996 through 799999, the 4 bytes just below the location pointed to by the stack pointer. (Four bytes because that’s how much memory is needed to store an int.) To accommodate this, 4 is subtracted from the value of the stack pointer, so it now points to the location 799996, just after the new first free location (799995).

The next line of code declares the variable el.gineSize (a doubj e) and initializes it to the value 3000. O. A double occupies 8 bytes, so the value 3000.0 will be placed in locations 799988 through 799995
on the stack, and the stack pointer is decremented by 8, so that once again, it points to the location just after the next free location on the stack.

When engineSize goes out of scope, the computer knows that it is no longer needed. Because of the way variable lifetimes are always nested, you can guarantee that, whatever has happened while engineSize was in scope, the stack pointer is now pointing to the location where engineSize is stored. To remove engineSize from the stack, the stack pointer is incremented by 8, so that it now points to the location immediately after the end of engineSize. At this point in the code, you are at the closing curly brace, so nRacingCars also goes out of scope. The stack pointer is incremented by 4. When another variable comes into scope after engineSize and nRacingCars have been removed from the stack, it will overwrite the memory descending from location 799999, where nRacingCars used to be stored.

If the compiler hits a line like int i, j, then the order of variables coming into scope looks indeterminate. Both variables are declared at the same time and go out of scope at the same time. In this situation, it does not matter in what order the two variables are removed from memory. The compiler internally always ensures that the one that was put in memory first is removed last, thus preserving the rule about no crossover of variable lifetimes.

Reference Data Types

Although the stack gives very high performance, it is not flexible enough to be used for all variables. The requirement that the lifetimes of variables must be nested is too restrictive for.many purposes. Often, you will want to use a method to allocate memory to store some data and be able to keep that data available long after that method has exited. This possibility exists whenever storage space is requested with the new operator – as is the case for all reference types. That is where the managed heap comes in.

If you have done any C++ coding that required low-level memory management, you will be familiar with the heap. The managed heap is not quite the same as the heap C++ uses; the managed heap works . under the control of the garbage collector and provides significant benefits when compared to traditional heaps.

The managed heap (or heap for short) is just another area of memory from the processor’s available 4GB.The following code demonstrates how the heap works and how memory is allocated for reference data types:

Capture

This code assumes the existence of two classes, Customer an EnhancedCustomer. The EnhancedCustomer class extends the Customer class.

First, you declare a Customer reference called arabel. The space for this will be allocated on the stack, but remember that this is only a reference, not an actual Customer object. The arabel reference takes up 4 bytes, enough space. to hold the address at which a Customer object will be stored. (You need 4 bytes to represent a memory address as an integer value between 0 and 4GB.)

The next line,

arabel = new Customer();

does several things. First, it allocates memory on the heap to store a Customer: object (a real object, not just an address). Then it sets the value of the variable arabel to the..address of the memory it has
allocated to the new eus tomer object. (It also calls the appropriate Customer ( )constructor to initialize the fields In the class instance, but we won’t worry about that here.)

The customer instance is not placed on the stack – it is placed on the heap, In this example, you don’t know precisely how many bytes a Customer object occupies, but assume for the sake of argument that it is 32, These 32 bytes contain the instance fields of Customer as well as some information that .NET uses to identify and manage its class instances.

To find a storage location on the heap for the new Customer object, the .NET runtime will look through the he~p and grab the first adjacent, unused block of 32 bytes. Again for the sake of argument, assume that this happens to be at address 200000, and that the arabel reference occupied locations 799996 through 799999 on the stack, This means that before-instantiating the arabel object, the memory contents will look similar.

Figure 12-2

After allocating the new Customer object, the contents of memory . Note that unlike the stack, memory in the heap is allocated upward, so the free space can be found above the used space.

Figure 12-3

The next line of code both declares a Customer reference and instantiates a Customer object. In this instance, spaot on the stack for the other Customer2 reference is allocated and space for the mr.Jones object is allocated on the heap in a single line of code:”

Customer otherCustomer2 = new EnhancedCustomer();

This line allocates 4 bytes on the stack to hold the otherCustomer2 reference, stored at locations 799992 through 799995. The otherCustomer2 object is allocated space on the heap starting at location 200032.

It is clear from the example that the process of setting up a reference variable is more complex than that for setting up a value variable, and there is a performance overhead. In fact, the process is somewhat oversimplified here, because the .NET runtime needs to maintain information about the state of the heap, and this information needs to be updated whenever new data is added to the heap, Despite this overhead, you now have a mechanism for allocating variables that is not constrained by the limitations of the stack. By assigning the value of one reference variable to another of the same type, you have two variables-that reference the same object in memory. When a reference variable goes out of scope, it is
removed from the stack as described in the previous section, but the data for a referenced object is still sitting on the heap. The data will remain on the heap until either the program terminates or the garbage collector removes it, which will happen only when it is no longer referenced by any variables .

That is the power of reference data types, and you will see this feature used extensively in C# code. It means that you have a high degree of control over the lifetime of your data, because it is guaranteed to exist in the heap as long as you are maintaining some reference to it.

Garbage Collection

The previous discussion and diagrams show the managed heap working very much like the stack, to the extent that successive objects are placed next to each other in memory. This means that you can work out where to place the next object by using a heap pointer that indicates the next free memory location and that is adjusted as you add more objects to the heap. However, things are complicated because the lives of the heap-based objects are not coupled to the scope of the individual stack-based variables that. reference them.

When the garbage collector runs, it will remove all those objects from the heap that are no longer referenced. ‘Immediately after it has done this, the heap will have objects scattered on it, mixed up with memory that has just been freed.

.Figure 12.4

If the managed heap stayed like this, allocating space for new objects would be an awkward process, with the runtime having to search through the heap for a block of memory big enough to store each new object. However, the garbage collector does not leave the heap in this state. As soon as the garbage collector has freed up all the objects it can, it compacts the heap by moving all remaining objects to form one continuous block of memory. This means that the heap can continue working just like the stack as far as locating where to store new objects. Of course, when the objects are moved about, all the references to those objects need to be updated with the correct new addresses, but the garbage collector handles that too.

This action of compacting by the garbage collector is where the managed heap really works differently from old unmanaged heaps. With the managed heap, it is just a question of reading the value of the heap pointer, rather than iterating through a linked list of addresses to find somewhere to put the new data. For this reason, instantiating an object under .NET is much faster. Interestingly, accessing objects tends to be faster too, because the objects are compacted toward the same area of memory on the heap, resulting,
in less page swapping. Microsoft believes that these performance gains more than compensate for the performance penalty that you get whenever the garbage collector needs to do some work to compact the heap and change all those references to objects it has moved.

Generally, the garbage collector runs when the .NET runtime determines that garbage collection is required. You can force the garbage collector to run at a certain point in your code by calling System . GC. Collect ( ). The Sys tern. GCclass is a .NET class that represents the garbage collector, and the Collect () method initiates a garbage collection. The GCclass is intended for rare situations in which you know that it’s a good time to call the garbage collector; for example, if you have just de-referenced a large number of objects is your code. However, the logic of the garbage collector does not guarantee that all unreferenced objects will be removed from the heap in a single garbage collection pass.

Posted on October 30, 2015 in Memory Management and Pointers

Share the Story

Back to Top