Friday, July 22, 2011

Lambda Expressions in C#

In a previous article I introduced delegates and anonymous delegates. In this article I intend to explain the fundamentals of Lambda expressions by building on our knowledge of delegates. It's useful to get a proper handle on delegates before delving into Lambdas, because Lambdas are just anonymous delegates in different clothing.

A Quick Example
At the end of my post on delegates, I introduced a common scenario where you might want to use an anonymous delegate to find an item from a list:

class AdHocSearch
{
    static void Main()
    {
        List<Book> bookDb = new List<Book>();
        bookDb.Add(new Book(){ title = "Romancing the Stone" });
        bookDb.Add(new Book(){ title = "Wuthering Heights" });
        
        Book foundBook = bookDb.Find(delegate(Book candidate)
        {
            return candidate.title.Equals(
                            "Wuthering Heights");
        });

        Console.WriteLine(foundBook.title);
    }
}

However the same code can easily be refactored to use a Lambda expression (if you use ReSharper just hit Alt+Enter and it will refactor for you):

class AdHocSearch
{
    static void Main()
    {
        List<Book> bookDb = new List<Book>();
        bookDb.Add(new Book(){ title = "Romancing the Stone" });
        bookDb.Add(new Book(){ title = "Wuthering Heights" });
        
        Book foundBook = bookDb.Find(
            candidate => candidate.title.Equals(
                                "Wuthering Heights"));

        Console.WriteLine(foundBook.title);
    }
}

The portion highlighted is the Lambda expression, and it does the same job as the delegate in the first code-block. However it does it in a way which is slightly different - it is more concise, and it's declaration is less explicit.

Let's start from the top - the Lambda operator.

The Lambda Operator
You can tell you are looking at a lambda expression when you see the lambda operator (=>). This operator can be read as "goes to":

x => x * x

So this expression (above) is read as "x goes to x times x".

How do we Get from Delegate to Lambda?
Let's explain the above expression by showing how it would look in familiar, delegate form. Then we can alter it step by step until it becomes a lamba expression again. Here's the fully written-out delegate equivalent of the above statement (below). I have written this in a way which should be familiar from my previous post on delegates:

public delegate int Square(int i);

void Main()
{
    Square square = delegate(int i)
    {
        return i*i;
    };
}

In the code-block above, the first thing to notice is that we have a named delegate type - it's name is Square. This is the first line in the code-block, the declaration that essentially says (in English) "There is a delegate type named Square, it has a single int input parameter, and an int return type."

Inside the body of the Main method, we have then instantiated an instance of Square, (named lower-case square), and assigned an anonymous delegate to it.

Step 1: Anonymise the Delegate Type
Well now, we are going to anonymise the delegate type aswell as the delegate. Anonymising means declaring the same behaviour, but without specifying a name. So in English: "There is a delegate type named Square, it has a single int input parameter, and an int return type".

Since we are no longer giving it a name, we must declare it at the same time as assigning a value to it, and therefore now the whole thing is enclosed within one statement. Remember, we are still naming the variable (lower-case square), but we are no longer naming the type (was init-capped Square):

public delegate int Square(int i);

void Main()
{
    Square square = delegate(int i)
    {
        return i*i;
    };
}

//becomes

void Main()
{
    Func<int, int> square = delegate(int i)
    {
        return i*i;
    };
}

To do this we used a type named Func<T, TResult> (see the definition here), which was introduced in C# 3.5. This class allows us to encapsulate a method as a reference-type, and pass in a couple of generic type parameters describing the argument and return type respectively: <int, int>.

You'll notice that there are several variations on Func, each with a different number of generic type parameters. You can encapsulate a method with up to 16 arguments, and there is a similar family of classes to deal with methods with no return type. Any method you assign to an instance of one of these types has to have matching input parameters and return type, and you can see in the code-block above this is the case.

Step 2: Switch to Lambda Syntax
Now let's convert the delegate on the right side of that assignment operation into a lamda expression:

Func<int, int> square = delegate(int i)
{
    return i*i;
};

//becomes

Func<int, int> square = i => i * i;

And there we have the same statement we started with. It uses the same variables to do the same thing and return the same value - but with different syntax.

The first thing you notice is how concise it is, even though it does exactly the same job. On the left side of the operator, we always have the input parameters.

  • left-of-the-operator - input parameters,
  • right-of-the-operator - expression or statement block

Remember, the operator is read as "goes to". So a lambda is always read as "input (goes to) expression", or "input (goes to) statement block".

Inferred Typing
With the delegate version we had to specify the type of the input parameter, when we said: delegate(int i). However with lambdas the compiler infers the type of i from it's usage. The compiler sees that we are assigning the expression to a strongly-typed Func object with generic type parameteres <int, int>. So it knows and doesn't need to be retold that the input parameter i in the expression is an int.

Don't make the mistake of thinking this is loosely-typed. Loose-typing is not possible in the .NET Framework. This is inferred typing, which is still very much strongly-typed. Type usage is inferred by the compiler on initialisation and cannot change. It is part of the suite of implicit-typing functionality introduced in C# 3.0, including:

In fact, one of the main benefits of lambda's, is they are a simple way to make many previously un-type-safe operations type-safe. Take a quick look at Mauricio Scheffer's blog here, where he has used lambda's to replace string literals. You see this type of usage a lot in .NET frameworks and packages nowadays.

In the first example in the first code-block of this post, without Lambdas or delegates the Find method would only be possible by passing in string literals representing the desired candidate properties. And you can also see, for example, from my post on StructureMap IoC, that statement lambdas are now the preferred initialisation approach (though the previous approach was also typesafe).

Specifying Input Parameters
If you want to override the compiler's inferred usage, you can use brackets on the left-side of the lambda operator to specify input parameters:

Func<int, int> square = (int i) => i * i;
Or, if you have no parameters to pass in, just use empty brackets:
Func<int> noParams = () => 5 * 5;

I have to admit i'm not sure yet where or when this would be useful.

Expression Lambdas vs. Statement Lambdas
So far we've only been using Expression Lambdas, which means that on the right-side of the lambda operator there is one, single expression. This is generally useful when you are trying to apply a function, such as Square above, or trying to apply a predicate, such as the List.Find example at the top of this article.

However you can construct multi-line, procedural lambdas if you wish, by using curly braces:

Action speakToWorld = n =>
{
    string s = n + " " + "World";
    Console.WriteLine(s);
};
speakToWorld("Hello");

Action is ofcourse the cousin of Func, but never returns a value (see the reference page here).

Expression Trees
One final thing to note here is the broader context within which both lambda's and delegates exist. We have moved into an arena where functionality now has the same status as value. What I mean is that we are used to assigning and manipulating values (and objects encapsulating values) within variables. Now it is just as easy to build and manipulate functionality. Expression trees are an example of that.

Earlier we used Func and Action to store our method pointers. These allow truly anonymous references to executable code. But these classes can be assembled and re-assembled in their own right, as components of an Expression object.

Have a quick look at the MSDN on Expression Trees:

"Expression trees represent code in a tree-like data structure, where each node is an expression, for example, a method call or a binary operation such as x < y."
"You can compile and run code represented by expression trees. This enables dynamic modification of executable code, the execution of LINQ queries in various databases, and the creation of dynamic queries."

A quick example from the-code-project:

ParameterExpression parameter1 = 
    Expression.Parameter(typeof(int), "x");

BinaryExpression multiply = 
    Expression.Multiply(parameter1, parameter1);

Expression<Func<int, int>> square = 
    Expression.Lambda<Func<int, int>>(multiply, parameter1);

Func<int, int> lambda = square.Compile();
Console.WriteLine(lambda(5));

In the code-block above, we have done the same thing our simple lambda expression from earlier does - square a number. It is much more verbose, yes, but look at the types on the last two lines before Console.WriteLine. We created a dynamic Expression object, which can be altered and assembled ad-hoc however you like at runtime - and then Compile()'d into executable code, and run - again, all at runtime.

In other words, you can have your code assemble more code as it runs. Or in other words again, you can have your code treat itself in the same, flexible way it used to only be able to treat data and values.

Aside from generating dynamic queries, what does this allow us to do? As this stack-overflow page points out, we can hand the power of code customisation over to users. Ever made a 'form builder', allowing users to customise post-driven dynamic webpages? I can't imagine how we did that in the past but they've just got a whole lot easier, and semantically correct. Or what about user-customised workflows? Ditto.

Further Reading

Thursday, July 21, 2011

Introduction to Delegates in C#

What is a delegate? Most articles dive straight into the technical stuff, but I think it makes it easier to learn when you start by looking at the metaphor - I will dive right into the practical example straight afterwards.

So first of all, what makes the word 'delegate' such an apt keyword?

The Metaphor
According to Wikipedia, a delegate is a person who is similar to a representative, but "delegates differ from representatives because they receive and carry out instructions from the group that sends them, and, unlike representatives, are not expected to act independently."

Imagine a government department is employing two private companies to carry out litter collection across two districts. The government are responsible for preparing the contracts, and intend to use a standard template contract for both private companies. But the two private companies have different operating practices, and each want to tailor their respective contracts to suit their own particular needs.

They each send a delegate to meet the government and make amendments to their respective contracts. Each delegate is briefed by their respective companies before they leave, so that they know exactly what their instructions are. Each delegate arrives intending to do the same thing - amend a contract. But their instructions and boundaries on how they can do that are different - and are defined clearly by the respective senders.

I'll end the metaphor here, because I don't want to lock into one example. But having that sense of what a delegate is should make the rest of this article easier to read.

Declaring a Delegate Type
First, let's declare a publicly accessible delegate type:

//Declare a delegate type
public delegate void BookProcessingDelegate(Book b);

It is important at this stage to understand that what we have defined is not a delegate but a delegate type. Each instance we later instantiate of this type will be an actual delegate.

Delegates are usually declared not as class members, but as peers of classes or structs, i.e. at the same level. This is because declared delegate types are types in their own right. They can ofcourse, also be declared nested inside a class (just like a class can be declared nested inside a class).

In other words, a delegate is an example of a reference-type, and is stored on the heap and accessed via a pointer (see my article on types, the stack and the heap).

Instantiating and Assigning a Delegate Method
Having declared our delegate type, there are three things we need to do to make use of it:

  1. Define a method (write some instructions)
  2. Instantiate a delegate and assign the method to it (brief the delegate of our instructions)
  3. Send the delegate off to do business on our behalf

I'm going to use a variation of the 'classic' delegate example, and note that I've commented numerically the implementation of the bullet points above:

//Declare a delegate type
public delegate void BookProcessingDelegate(Book b);

public class Book
{
    public string title { get; set; }
}

public class BookDB
{
    private IList<Book> books;

    public void ProcessBooks(BookProcessingDelegate del)
    {
        foreach(Book b in books)
        {
            // Call the delegate method
            del(b);
        }
    }
}

class Program
{
    //1. Define a method
    private static void PrintBookTitle(Book b)
    {
        Console.WriteLine(b.title);
    }

    static void Main()
    {
        BookDB bookDb = new BookDB();

        //2. Instantiate a delegate, and assign the method
        BookProcessingDelegate printTitles = PrintBookTitle;

        //3. Send the delegate off to do work on our behalf
        bookDb.ProcessBooks(printTitles);    
    }
}

When we assign the method to the delegate (item 2 above), we are actually assigning to the delegate a pointer to the method. Then, when the BookDB.ProcessBooks method uses (invokes) the delegate, it follows the pointer and actually invokes the referenced method.

Logically, it is exactly the same as if the body of the referenced method had been declared inside of BookDB. But ofcourse it wasn't, and that's the key to the usefulness of delegates - i'll discuss this in more detail shortly.

But first of all let's explore two more ideas - anonymous delegates, and multicast delegates.

Anonymous Delegates
In the above example, we assigned a named method which was a private class member (namely PrintBookTitle). If that method is only going to be used for this one specific purpose, it can be much more convenient to declare an anonymous method.

What follows is exactly the same code block as above, but now the PrintBookTitle method has been anonymised, causing steps 1 and 2 to become one statement:

class Program
{
    static void Main()
    {
        BookDB bookDb = new BookDB();

        //1. Define a method AND
        //2. Instantiate a delegate, and assign a method
        BookProcessingDelegate printTitles = delegate(Book b) {
            Console.WriteLine(b.title);
        };

        //3. Send the delegate off to do work on our behalf
        bookDb.ProcessBooks(printTitles);    
    }
}

Anonymous delegates are more concise, and they are the foundation of lambda expressions, which I discuss in another article.

Multicast Delegates
It's another odd word, but multicast is based on another metaphor. Check out the link to get a quick sense of that, but it's simple enough anyway. You can assign more than one method to a delegate. The delegate will respond when invoked, by executing each method in order:

class Program
{
    private static void PrintLower(Book b) {
        Console.WriteLine(b.title.ToLower());
    }

    private static void PrintUpper(Book b) {
        Console.WriteLine(b.title.ToUpper());
    }

    static void Main()
    {
        //Instantiate a delegate, and assign TWO methods
        BookProcessingDelegate printTitles = PrintUpper;
        printTitles += PrintLower;

        //You can also use the -= operator to remove methods
        //from the 'invocation list'
        printTitles -= PrintLower;
    }
}

You can add and remove methods to your heart's content. The resulting list of methods which must be called on invocation of the delegate is referred to as the delegate's invocation list.

If the delegate has a return value, only the value returned by the last method in the invocation list will be returned. Similarly, if the delegate has an out or ref parameter, the parameter will end up being equal to the value assigned in the last method in the invocation list (see also my article on parameter passing in C#).

What Are Delegates Useful For?
As we have seen, delegates give us the ability to assign functionality at runtime. This adds to our toolbox of assignment operations - we all know how easy it is to assign values, or references to objects. Now we can assign functionality too.

Look back to the first, main code-block of this article. It should be clear that any class, anywhere in your application that chooses to call BookDB.ProcessBooks can do so and provide it's own specific, tailored functionality. The Program class happens to have sent one particular type of functionality (printing to the console), but any delegate can point to any method implementation.

Without delegates, the creators of the BookDB class would have to have thought up in advance every possible thing that callers might want to do with the Book list - and provide lots of methods to cover all those bases. In doing so, the BookDB class would be taking on responsibilities that are not within it's problem domain.

The other way to achieve the same functionality without delegates would be for the BookDB to expose it's internal list as a property, or via an access method. The Program class could then iterate over the collection itself. But this approach would mean that the Program class was shouldering responsibilities that are not within it's domain - iterating over a collection.

Delegates therefore provide an elegant solution which helps achieve the Seperation of Concerns principle within your application architecture.

A Common Usage - Ad-Hoc Searching
There are lots of places in the .NET Framework where you can use delegates, and their newer equivalent, lambda expressions. We'll take a look at one example, by altering our BookDB code a little. Let's say you want to find a book with a particular title, from a list of Book items:

class AdHocSearch
{
    static void Main()
    {
        List<Book> bookDb = new List<Book>();
        bookDb.Add(new Book() { title="Romancing the Stone" });
        bookDb.Add(new Book() { title="Wuthering Heights" });

        Book foundBook = bookDb.Find(delegate(Book candidate)
        {
            return candidate.title.Equals("Wuthering Heights");
        });

        Console.WriteLine(foundBook.title);
    }
}

If you couldn't use a delegate, you would have to manually iterate over the list. But fortunately for us, the List<T>.Find() method will accept a delegate function which describes how to apply a condition which determines a match for us. This leaves the job of iterating the collection, with it's associated efficiency concerns, in the correct problem domain - the domain of the List.

Quick Runthrough
For a nice, quick, concise primer on the subjects I've discussed above, check out this Youtube video:

I don't think it stands up as a learning resource in it's own right, but as a primer it's quite well organised.

Further Reading
This is quite a nice delegates tutorial on MSDN.

Wednesday, July 20, 2011

Pass by Value and Pass by Reference in C#

This article is a quick explanation of C#'s handling of pass-by-val and pass-by-ref. If you haven't already done so, it's a worth getting to grips with the heap and the stack - that article will also explain the relationship between value types, reference types and pointers, which will be really useful.

Value-Types
By default, value types are 'passed-by-value' into C# methods. This means that inside the body of the called method, the value instance can be changed or re-assigned without affecting the instance of the value in the caller.

static void Main(string[] args)
{
    int i = 1;
    Console.WriteLine(i); // prints 1
    MyMethod(i);
    Console.WriteLine(i); // prints 1
}

static void MyMethod(int i)
{
    i = 2;
}

If you want to override this behaviour, you can use the ref keyword or out keyword:

static void Main(string[] args)
{
    int i = 1;
    Console.WriteLine(i); // prints 1
    MyMethod(ref i);
    Console.WriteLine(i); // prints 2
}

static void MyMethod(ref int i)
{
    i = 2;
}

You can use the ref or the out keyword, the only difference it makes is that ref needs the value to have been initialised before it is passed.

The compiler will issue an error if you try to send an unassigned variable with the ref keyword, but it will let you send happily using out. Ofcourse, if you don't have logic to handle the unassigned out variable in the target method you will get a runtime exception anyway.

Reference-Types
There really is no way to pass reference-types other than by reference. So although you can use the out and ref keywords with an object, there really isn't much point:

static void Main(string[] args)
{
    MyClass myObj = new MyClass();
    MyMethod(ref myObj);            //not much point
}

If you want to pass a reference-type variable by-value, you need to start thinking about cloning the object, which is a whole new subject matter in itself. If you simply want to prevent the passing object from being altered you might want to consider making the class immutable, but in any case that's outside the scope of this article.

More examples

Monday, July 18, 2011

Boxing and Unboxing in C#

C#, like all .NET languages, uses a unified type system (called Common Type System). The idea is that all types that can ever be declared within the framework, always ultimately derive from System.Object. This diagram gives you some idea how that works at the top-level.

However if primitive datatypes such as int and bool were to be stored always as objects it would be a huge performance hit. So in reality, at any given moment these primitive types are capable of being represented in one of two ways:

  • As a value type - stored on the stack or the heap
  • As a reference type - stored on the heap

(Check out my article on value types, reference types, the stack and the heap.)

Boxing
Boxing is the name given to the process of converting a value-type representation to a reference-type representation. The process is implicit - just use an assignment operator and the compiler will infer from the types involved that boxing is necessary:

public static void BoxUsingObjectType()
{
    const int i = 0;
    object intObject = i; //Boxing occurs here
}

In the above example the integer i is boxed into a generic object. This is the type of usage that you will generally see - usually the point of boxing is so that you can treat the value-type in a generic way (alongside reference-type objects). More on that later.

It's also useful to note that you can box directly to the correct type:

public static void BoxUsingCorrectTypes()
{
    const short s = 0;    //equivalent to System.Int16
    System.Int16 shortObject = s;

    const int i = 0;      //equivalent to System.Int32
    System.Int32 intObject = i;

    const long l = 0;     //equivalent to System.Int64
    System.Int64 longObject = l;
}

In fact all that happens when you box an int to an object is that it is first boxed and then downcasted.

The example below shows that you cannot box a larger value-type into a smaller reference-type, for the obvious reason that data will be lost. However, the reverse is possible, you can box a smaller value-type into a larger reference-type:

public static void BoxToDifferentSizedTypes()
{
    const int intVal = 0;   //equivalent to System.Int32
    const long longVal = 0; //equivalent to System.Int64

    /* This is fine */
    System.Int64 longObject = intVal;

    /* This causes a compiler error */
    System.Int32 intObject = longVal;
}

The second assignment operation above causes a compiler error.

Unboxing
The process of unboxing is explicit, and uses the same syntax as casting. Ofcourse, casting can only occur between two reference-type variables, so when the compiler encounters the syntax below with a value-type on the left and a reference-type on the right, it infers that unboxing is necessary:

public static void UnBox(System.Int32 intObject)
{
    int i = (int) intObject; //Unboxing occurs here
}
//or
public static void UnBox(object intObject)
{
    int i = (int) intObject; //Unboxing occurs here
}

Why Would you Use Boxing?
You generally wouldn't - and one good reason to know about it is to help avoid it. Like casting, it's an expensive operation. Here's a common scenario where you might inadvertently box:

int i = 10;
string s = string.Format("The value of i is {0}", i);

But the most common usage of boxing is in collections. In the .NET 1.1 days you might have used an ArrayList something like this:

ArrayList list = new ArrayList();
list.Add(10);                 // Boxing
int i = (int) list[0];        // Unboxing

Because ArrayList boxes and downcasts to object, this is an expensive operation.

Now in the era of generics this is less of a problem, because when you use a primitive type as a generic parameter, such as the <int> below, it is not boxed, and it is typesafe. This is a very good reason to use a generic collections.

List<int> list = new List<int>();
list.Add(10);                 // No Boxing
int i = list[0];              // No Unboxing  

Generically-Typed Collections
The problem comes when you want to use a mixed list. First, how it looks in .NET 1.1:

ArrayList list = new ArrayList();
list.Add(10);                       // Boxing
list.Add(new SomeClass());          // Downcasting
int y = (int) list[0];              // Unboxing
SomeClass o = (SomeClass) list[1];  // Casting

Even with a generic list, we have to use object, or some common ancestor type, to ensure that our list will accept all values:

List<object> list = new List<object>();
list.Add(10);                       // Boxing
list.Add(new SomeClass());          // Downcasting
int y = (int) list[0];              // Unboxing
SomeClass o = (SomeClass) list[1];  // Casting

This is very similar to using ArrayList. But although there are no efficiency gains, with generics we explicitly have control as to what level casting operations occur.

There are some discussions here and here on stack-overflow about other possible use cases involving boxing and unboxing.

Further Reading
This page on the microsoft website gives you some more ideas including some diagrams demonstrating what's happening on the stack and the heap at runtime.

Also, this article on the-code-project has some nice examples and images, and a fuller explanation of value and reference types.

Sunday, July 17, 2011

The Heap and the Stack

There are two locations in memory where the .NET Framework stores items during runtime - the Heap and the Stack. Understanding the difference between the two is one of those nuanced things that will pay dividends in all kinds of programming scenarios.

This article is a concise run-through of the most important concepts. For a broader explanation, check out this article on c-sharp corner or this one on code-project (with nice diagrams).

What are the Heap and the Stack?

The stack

The Stack is more or less responsible for what's executing in our code.

There is a stack for every currently executing thread at runtime. Within each thread, each time a method is called, the method is added to the relevant stack for the thread. When the method is finished executing, it is removed from the stack. Therefore the stack is sequential, building upwards in one straight line, and at any given moment the CLR is only really interested in the items at the top of the stack.

Bear in mind that although it's easier to visualise it in terms of 'methods' being added to the stack, in reality the method itself is not added to the stack - the resources associated with the method are. This means arguments, internal method members, and the return value.

The stack is self-cleaning - when the method at the top of the stack finishes executing, the items associated with it have their resources freed.

The heap

The Heap is more or less responsible for keeping track of our objects. Anything on the heap can be accessed at any time - it's like a heap of laundry, anything can go anyplace.

Items placed in the Heap can be referenced by pointers to the relevant memory location where they are stored. Unlike the Stack, the Heap has to worry about garbage collection.

What can go on the Heap and the Stack?
First we'll talk about the different resources that can be placed on the heap and the stack, and then we'll talk about which go where.

  1. Value Types
  2. Reference Types, and
  3. Pointers

For a nice tree diagram go here, or check out the lists:

1. Value Types
These are the types which are associated with System.ValueType (when boxed they derive from it).

  • bool
  • byte
  • char
  • decimal
  • double
  • enum
  • float
  • int
  • long
  • sbyte
  • short
  • struct
  • uint
  • ulong
  • ushort

2. Reference Types
System.Object, and anything which derives from it. Think in terms of:

  • class
  • interface
  • delegate
  • object
  • string

3. Pointers
A Pointer is a chunk of space in memory that points to another space in memory - it's value is either a memory address or null. All Reference Types are accessed through pointers. We don't explicitly use Pointers, they are managed by the CLR, but they exist in memory as items in their own right.

Which Go Where?

  • Reference Types always go on the Heap
  • Value Types and Pointers go wherever they are declared

Look at the example below:

public class MyClass
{
    /* This variable is placed on the HEAP
       inline with the containing reference-type,
       i.e. the class, when it is instantiated */
    public int MyClassMember;

    /* These 3 variables are placed on the STACK
       when the method is called, and removed
       when execution completes */
    public int MyMethod(int myArg)
    {
        int myLocal;
        return myArg + myLocal;
    }
}

Ofcourse, the class MyClass is a Reference Type and is placed on the Heap. The member variable MyClassMember is declared inline with a reference type, and therefore it is stored inline with that reference type on the Heap.

The local variables myArg, myLocal and the return variable are incidental to the object - they are not class members. They are not inline with a reference type and therefore they are stored on the Stack.

Reference Types and Pointers
When a Reference Type such as an object is instantiated, the actual contents are stored on the Heap. Under the hood, the CLR also creates a Pointer, the contents of which are a reference to the object's memory location on the heap.

In this way, reference types can be easily addressed, and can be addressed by reference from more than one variable. But where is that pointer stored?

It's the same rules as with Value Types - it depends where the object is declared:

public class MyClass
{
    /* This pointer is stored on the HEAP */
    MyClass myMember = new MyClass();

    public void MyMethod()
    {
        /* This pointer is stored on the STACK */
        MyClass myLocal = new MyClass();
    }
}

As noted above and discussed in the c-sharp-corner article, the same object can be referenced by more than one Pointer. It's important to understand that object assignment operations in .NET actually assign the pointer value - the memory address of the object. They do not copy the object's value, only the pointer value.

Take a look at this example:

public int ReturnValue()
{
    int x = new int();
    x = 3;
    int y = new int();
    y = x;      
    y = 4;          
    return x;
}

//This returns 3

This is simple enough.

But what happens when we wrap the value types inside a reference type? The key is what happens when you use the assignment operation on a reference type.

public class MyInt
{
    public int Val;
}

public int ReturnValue2()
{
    MyInt x = new MyInt();
    x.Val = 3;
    MyInt y = new MyInt();
    y = x;  /* y now points to the 'x' memory address */
    y.Val = 4;              
    return x.Val;
}

//This returns 4

As you can see, the assignment assigns the Pointer value - the memory address of the assigned object - not the value of the object or the object's members. As a consequence, the new MyInt() that was created and initially assigned to y, is now orphaned.

Dynamic vs. Static Memory
So the Stack and the Heap have different structures, behaviours and reasons for being. One is sequential and related to the current method, the other is tree-based, messy (requires GC) and random-access.

But why not just use one memory store - why separate them at all? The answer is to separate static memory from dynamic memory.

  • The Stack is static - once a chunk of memory is allocated for a variable, it's size cannot and will not change. Each unit is small and of fixed size.
  • The Heap is dynamic - reference types encapsulate value types and other reference types. Each unit is larger and of variable size.

These differences mean that the way space is allocated and consumed is very different. It's outside the scope of this article, but you can do more reading by looking up dynamic memory allocation and static memory allocation.

Friday, July 15, 2011

The 'Finally' in Try / Catch

When exactly is the finally block called in try/catch statements? If you know, you can correctly predict what gets printed here:

static void Main(string[] args)
{
    AssignToInt(null);
    AssignToInt(new object());
    AssignToInt(1);
}

public static bool AssignToInt(object o)
{
    try
    {
        int i = (int)o;
        Console.WriteLine("{0} assigned OK", i);
        return true;
    }
    catch (NullReferenceException)
    {
        Console.WriteLine("NullReferenceException");
        return false;
    }
    catch (InvalidCastException)
    {
        Console.WriteLine("InvalidCastException");
        return false;
    }
    finally
    {
        Console.WriteLine("Finally...");
    }
}

The finally block is always called after any execution within a try/catch block - regardless of whether an exception was caught or not. Even when a return command is found inside a try or catch block, the CLR will execute the finally block before executing the return command.

The best example of why you would use it is to clean up and close any resources you have left open. Lets say you open a FileStream in your try block - it will need to be closed whether or not an exception occurs. The FileStream.Close() method call should be made inside the finally block.

Therefore the output you get from running the code is:

NullReferenceException
Finally...
InvalidCastException
Finally...
1 assigned OK
Finally...

Further reading:

Monday, July 11, 2011

Injecting Controller Dependencies in .NET MVC with StructureMap

When you are creating an MVC application, your Controllers will by default be created without any dependencies. As soon as you try to inject a dependency into your Controller's constructor...

public EbookController(IEbookRepository ebookRepository)
{
    this.ebookRepository = ebookRepository;
}

You will get an error message something like this:

No parameterless constructor defined for this object.

This is because the MVC framework internally uses a factory to generate Controller objects based on your classes. When the factory tries to instantiate your class, it doesn't have an implementation that handles your dependency.

You can override the default factory with one that uses your preferred Dependency Injection solution (in this example I have used StructureMap), and therefore will take over the controller instantiation and resolve any dependencies on your behalf. You need to do three things:

  1. Bootstrap your dependencies
  2. Override the default Controller factory
  3. Tell your application to use your Bootstrapper and overriden factory

I'm going to demonstrate using the example I've already started above: an Ebook controller depending on an EbookRepository.

1. Bootstrap your dependencies

using StructureMap;

public static class Bootstrapper
{
    public static void Bootstrap()
    {
        ObjectFactory.Initialize(
            x => x.For<IEbookRepository>()
                .Use<EbookRepository>());
    }
}

2. Create a new class to override the default Controller factory with one that uses StructureMap to instantiate Controllers:

public class StructureMapControllerFactory
    : DefaultControllerFactory
{
    protected override IController GetControllerInstance(
        RequestContext requestContext, Type controllerType)
    {
        try
        {
            if (controllerType == null)
                return base.GetControllerInstance(
                    requestContext, controllerType);

            return ObjectFactory.GetInstance(controllerType)
                as Controller;
        }
        catch (StructureMapException)
        {
            System.Diagnostics.Debug.WriteLine(
                ObjectFactory.WhatDoIHave());
            throw;
        }
    }
}

3. Tell your application to use the Bootstrapper and overriden factory, by editing Global.asax.cs:

protected void Application_Start()
{
    //Any other commands
    RegisterRoutes(RouteTable.Routes);

    Bootstrapper.Bootstrap();
    ControllerBuilder.Current.SetControllerFactory(
        new StructureMapControllerFactory());
}

That's it!

The source of the technique used in this article can be found here on Shiju Varghese's blog.

Wednesday, July 6, 2011

Introduction to IoC Containers and Dependency Injection using StructureMap

In a previous article, I discussed the nature of the IoC principle as a separate entity from Dependency Injection. In this article I want to explain IoC Containers, and show how these can be used alongside Dependency Injection to create clean, testable, scaleable code, via decoupled architectures.

But before I explain what IoC Containers and DI are, let's talk about why you would want them.

Decoupled Architectures
In my IoC post, we looked at an example in which we handed over control to a GUI framework. I'm going to continue with that example here, and have included a cut-down version of the class in the codeblock below.

(I've renamed the class GuiFramework to GraphicUserInterface, because the original naming was only intended to highlight the differences between libraries and frameworks.)

public class BusinessLogic {

    private GraphicUserInterface gui = 
                    new GraphicUserInterface();

    public void Run(){
        gui.NamePrompt("Please enter your name");
        gui.AddNameChangedHandler(OnNameChanged);
    }
}

public class Program() {
    public static void Main() {
        BusinessLogic busLogic = new BusinessLogic();
        busLogic.Run();
    }
}

One of the main benefits of the IoC approach is that we have a separation of concerns. The GraphicUserInterface class handles all of the stuff to do with GUIs. The BusinessLogic class handles all of the stuff to do with business logic.

The business logic class manipulates the GUI class in order to implement business logic functionality, but it doesn't know anything about how, when or where the GUI does it's job. It simply uses the GUI's services and that's it.

So our architecture is almost decoupled. The only thing that still couples our architecture is that the business logic class directly instantiates an instance of the GraphicUserInterface class. With this setup we say that the BusinessLogic class depends on the GraphicUserInterface class, or that the GraphicUserInterface class is a dependency of the BusinessLogic class.

If we can properly decouple these components, then they become like 'plugins':

  • entirely independent of each other
  • entirely interchangeable with other components that provide the same or similar services
  • but still entirely inter-operable with each other - any plugged in component will work without adapting or making config changes to other components in the system - they will just 'plug and play'

Why That's Good
If you've worked with enterprise-level applications, or any application with more than a trivial number of components, it will be clear why this is positive. Here are a few important reasons:

  • Requirements changes - It is practically inevitable during any project that requirements will change. Decoupled architectures allow you to replace components in an existing system without having to adapt and recompile other components to match.
  • Unit testing - A decoupled component can be tested in isolation from it's dependencies, by mocking the dependencies. In this way unit tests can be run continuously throughout the project - and it becomes easy to have a development process in which your software always in a testable state. This is a huge advantage because it provides certain guarantees and confidence throughout. It also paves the way for TDD, which i'll discuss in another post.
  • Separation of concerns - it enforces an architecture where responsibilities are meted out to components neatly. This is nicer to work with, but it also has the practical benefit that if you bring a new developer onto the project, they can start work very quickly, and with a minimum of knowledge about the system. For example, if they are working on a data-access abstraction they need not ever have seen the UI - the responsibilities that this developer has will end at the boundary of the data-access abstraction, or whatever other component they are working on.

So let's see what that really means, by decoupling the example code, step by step.

Step 1: Use an Interface
Instead of referring to the implementation class directly by name, we'll use an interface. I've commented out the old declaration and added a new one.

public class BusinessLogic {

    //private GraphicUserInterface gui = new
    //                GraphicUserInterface();
    private IUserInterface userInterface;

    public void Run(){
        gui.NamePrompt("Please enter your name");
        gui.AddNameChangedHandler(OnNameChanged);
    }
}

public class Program() {
    public static void Main() {
        BusinessLogic busLogic = new BusinessLogic();
        busLogic.Run();
    }
}

First, notice the rename. The interface is called IUserInterface - not IGraphicUserInterface. This is important, because our business logic class doesn't need to know what kind of implementation it is dealing with. All it needs is the services of some kind of user interface, in order to prompt and accept a user's name. Exactly what kind of UI is unimportant - it could be a GUI or a web form or a speech recognition component - the BusinessLogic class doesn't care.

All the BusinessLogic class wants to know is that the class is capable of providing the services it requires. Implementation classes can indicate this capability by implementing the relevant interface - in this case IUserInterface.

Second, notice that we are not instantiating our new private userInterface member, only declaring it. Ofcourse if our application is going to work, the member will need to be instantiated - but how?

Step 2: Use Dependency Injection
We 'inject' a pre-instantiated object into our class:

public class BusinessLogic {

    private IUserInterface userInterface;

    public void BusinessLogic(IUserInterface aUserInterface){
        this.userInterface = aUserInterface;
    }

    public void Run(){
        gui.NamePrompt("Please enter your name");
        gui.AddNameChangedHandler(OnNameChanged);
    }
}

public class Program() {
    public static void Main() {
        BusinessLogic busLogic = new BusinessLogic();
        busLogic.Run();
    }
}

The approach used above is called Constructor Injection. Alternatively you could use Setter Injection, which is where you inject a dependency via a method (a setter method). In either case, the dependency is instantiated outside of the dependent class and injected in.

We'll use Constructor Injection in this example, and you'll see why a little later on.

Step 3: Set up an IoC Container
So where is the object instantiated? We need an IoC container.

These 'contain' the mappings required to wire up our dependencies. You'll hear them called DI containers too, but it means the same thing (there's a brief discussion on terminology at the bottom of this article).

I'm going to use StructureMap as an example because it's easy to use and has an intuitive syntax. First, we create a Bootstrapper. This becomes the single location within our program where concrete types are mapped to interfaces:

using StructureMap;

public static class Bootstrapper
{
    public static void Bootstrap()
    {
        ObjectFactory.Initialize(x =>
        {
            x.For<IUserInterface>().Use<GraphicUserInterface>();
            x.For<IBusinessLogic>().Use<BusinessLogic>();
        });
    }
}

StructureMap is one of a number of IoC Containers for .NET. Regardless of which container you use, you would expect to see all mappings declared inside a Bootstrapper, and the Bootstrapper to be created during initialisation (Global.asax for web applications, or the main routine for a desktop application).

StructureMap's Object Factory
The ObjectFactory mentioned in the codeblock above is a static wrapper for the StructureMap container. Whenever you use the ObjectFactory static class anywhere in your code, you are always referencing the same singleton container (it is possible to use more than one container object but this is not standard).

Once you have bootstrapped your mappings, you can use the static ObjectFactory reference anywhere in your code to instantiate an object by describing the service you require (passing in an interface name):

public class Program {
    public static void Main()
    {
        Bootstrapper.Bootstrap();
        var userInterface = 
            ObjectFactory.GetInstance<IUserInterface>();
    }
}

In the above example, the implicitly-typed var userInterface will now hold an instance of the GraphicUserInterface class.

Auto-Wiring
You might think at this stage, great - now we can just create a BusinessLogic object and send the userInterface object as an argument. But it's simpler than that, and this is one good reason to use Constructor Injection for your dependencies.

If you decouple all of your components, including (in this case) the BusinessLogic class, then StructureMap will figure out your dependencies for you using Auto-Wiring:

public class Program {

    public static void Main()
    {
        Bootstrapper.Bootstrap();

        //var userInterface = 
        //  ObjectFactory.GetInstance<IUserInterface>();
        var businessLogic = 
            ObjectFactory.GetInstance<IBusinessLogic>();

        businessLogic.Run();
    }
}

Notice that I have commented out the top line. Auto-wiring basically means that unless you tell the bootstrapper differently, any time you instantiate an object with dependencies StructureMap will create an instance of the dependency and inject it for you.

More With StructureMap
There's a lot more you can do with StructureMap, or any IoC Container. For example, you can wire up a singleton so that all dependencies resolve to a single instance:

x => x.For<IFoo>().Singleton().Use<Foo>()

You can use 'open generics', so that you can decouple your genericised classes. For example you might have a class Repository which could handle Book objects or Magazine objects, i.e. Repository<Book> or Repository<Magazine>. Using this syntax, you can declare mappings that are generics-agnostic:

x => x.For(typeof (IRepository<>)).Use(typeof (Repository<>))

You can specify an instance to use for particular constructor arguments. For example, if your class Foo is constructed with a string called argName:

x.For<IFoo>().Use<Foo>().Ctor<string>("argName").Is("arg");
//or
x.For<IFoo>().Use<Foo>().Ctor<string>("argName").
                           .EqualToAppSetting("app-setting-id");

This is ofcourse a whirlwind tour. Here are a few recommended links if you want to run off and figure out StructureMap now:

When To Use Which Approach
There will be times when Constructor Injection is not appropriate, and you need fine-grained control of object instantiation during program execution. In these circumstances you can use the ObjectFactory object as we did in the examples earlier:

var newFoo = ObjectFactory.GetInstance<IFoo>()

This gives you the same type of control over when things are instantiated as you get with ordinary coupled code. But it should be clear that the easiest approach whenever possible is to use Constructor Injection and let Auto-Wiring do the work for you. When this isn't possible move to Setter Injection, and when that isn't possible, use ObjectFactory to create an instance.

Decoupling Your Container
Ofcourse, ObjectFactory is a class defined within the StructureMap namespace. If you pepper it around your code (i.e. using GetInstance), then you are coupling your code to StructureMap. Some people prefer to ensure full decoupling from IoC Containers by housing Container references within custom object factories, and accessing those factories via an interface.

In another post I will discuss Autofac, a google code project and DI/IoC Container. Autofac, unlike StructureMap, ensures the container is fully decoupled by design. Instead of using a generic static Object Factory, Autofac allows you to create and inject custom object factories for use in your code.

The IoC/DI Name
You can see Martin Fowler here discussing the evolution of the name. Three different techniques / approaches have merged together to form a pattern, and so we have this composite name - IoC / DI / Container.

You can see what I mean if you consider that it's perfectly possible to use Dependency Injection without inverting control, or decoupling via interfaces:

public class BusinessLogic {

    private SomeLibrary someLibrary;

    public void BusinessLogic(SomeLibrary someLib){
        this.someLibrary = someLib;
    }
}

It's also possible to invert control without using DI, or to use containers without DI or IoC. However these approaches have grown to compliment each other and have become a mainstream approach to creating clean, testable, scaleable application architecures.