Monday, July 18, 2011

Boxing and Unboxing in C#

C#, like all .NET languages, uses a unified type system (called Common Type System). The idea is that all types that can ever be declared within the framework, always ultimately derive from System.Object. This diagram gives you some idea how that works at the top-level.

However if primitive datatypes such as int and bool were to be stored always as objects it would be a huge performance hit. So in reality, at any given moment these primitive types are capable of being represented in one of two ways:

  • As a value type - stored on the stack or the heap
  • As a reference type - stored on the heap

(Check out my article on value types, reference types, the stack and the heap.)

Boxing
Boxing is the name given to the process of converting a value-type representation to a reference-type representation. The process is implicit - just use an assignment operator and the compiler will infer from the types involved that boxing is necessary:

public static void BoxUsingObjectType()
{
    const int i = 0;
    object intObject = i; //Boxing occurs here
}

In the above example the integer i is boxed into a generic object. This is the type of usage that you will generally see - usually the point of boxing is so that you can treat the value-type in a generic way (alongside reference-type objects). More on that later.

It's also useful to note that you can box directly to the correct type:

public static void BoxUsingCorrectTypes()
{
    const short s = 0;    //equivalent to System.Int16
    System.Int16 shortObject = s;

    const int i = 0;      //equivalent to System.Int32
    System.Int32 intObject = i;

    const long l = 0;     //equivalent to System.Int64
    System.Int64 longObject = l;
}

In fact all that happens when you box an int to an object is that it is first boxed and then downcasted.

The example below shows that you cannot box a larger value-type into a smaller reference-type, for the obvious reason that data will be lost. However, the reverse is possible, you can box a smaller value-type into a larger reference-type:

public static void BoxToDifferentSizedTypes()
{
    const int intVal = 0;   //equivalent to System.Int32
    const long longVal = 0; //equivalent to System.Int64

    /* This is fine */
    System.Int64 longObject = intVal;

    /* This causes a compiler error */
    System.Int32 intObject = longVal;
}

The second assignment operation above causes a compiler error.

Unboxing
The process of unboxing is explicit, and uses the same syntax as casting. Ofcourse, casting can only occur between two reference-type variables, so when the compiler encounters the syntax below with a value-type on the left and a reference-type on the right, it infers that unboxing is necessary:

public static void UnBox(System.Int32 intObject)
{
    int i = (int) intObject; //Unboxing occurs here
}
//or
public static void UnBox(object intObject)
{
    int i = (int) intObject; //Unboxing occurs here
}

Why Would you Use Boxing?
You generally wouldn't - and one good reason to know about it is to help avoid it. Like casting, it's an expensive operation. Here's a common scenario where you might inadvertently box:

int i = 10;
string s = string.Format("The value of i is {0}", i);

But the most common usage of boxing is in collections. In the .NET 1.1 days you might have used an ArrayList something like this:

ArrayList list = new ArrayList();
list.Add(10);                 // Boxing
int i = (int) list[0];        // Unboxing

Because ArrayList boxes and downcasts to object, this is an expensive operation.

Now in the era of generics this is less of a problem, because when you use a primitive type as a generic parameter, such as the <int> below, it is not boxed, and it is typesafe. This is a very good reason to use a generic collections.

List<int> list = new List<int>();
list.Add(10);                 // No Boxing
int i = list[0];              // No Unboxing  

Generically-Typed Collections
The problem comes when you want to use a mixed list. First, how it looks in .NET 1.1:

ArrayList list = new ArrayList();
list.Add(10);                       // Boxing
list.Add(new SomeClass());          // Downcasting
int y = (int) list[0];              // Unboxing
SomeClass o = (SomeClass) list[1];  // Casting

Even with a generic list, we have to use object, or some common ancestor type, to ensure that our list will accept all values:

List<object> list = new List<object>();
list.Add(10);                       // Boxing
list.Add(new SomeClass());          // Downcasting
int y = (int) list[0];              // Unboxing
SomeClass o = (SomeClass) list[1];  // Casting

This is very similar to using ArrayList. But although there are no efficiency gains, with generics we explicitly have control as to what level casting operations occur.

There are some discussions here and here on stack-overflow about other possible use cases involving boxing and unboxing.

Further Reading
This page on the microsoft website gives you some more ideas including some diagrams demonstrating what's happening on the stack and the heap at runtime.

Also, this article on the-code-project has some nice examples and images, and a fuller explanation of value and reference types.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.