Friday, July 22, 2011

Lambda Expressions in C#

In a previous article I introduced delegates and anonymous delegates. In this article I intend to explain the fundamentals of Lambda expressions by building on our knowledge of delegates. It's useful to get a proper handle on delegates before delving into Lambdas, because Lambdas are just anonymous delegates in different clothing.

A Quick Example
At the end of my post on delegates, I introduced a common scenario where you might want to use an anonymous delegate to find an item from a list:

class AdHocSearch
{
    static void Main()
    {
        List<Book> bookDb = new List<Book>();
        bookDb.Add(new Book(){ title = "Romancing the Stone" });
        bookDb.Add(new Book(){ title = "Wuthering Heights" });
        
        Book foundBook = bookDb.Find(delegate(Book candidate)
        {
            return candidate.title.Equals(
                            "Wuthering Heights");
        });

        Console.WriteLine(foundBook.title);
    }
}

However the same code can easily be refactored to use a Lambda expression (if you use ReSharper just hit Alt+Enter and it will refactor for you):

class AdHocSearch
{
    static void Main()
    {
        List<Book> bookDb = new List<Book>();
        bookDb.Add(new Book(){ title = "Romancing the Stone" });
        bookDb.Add(new Book(){ title = "Wuthering Heights" });
        
        Book foundBook = bookDb.Find(
            candidate => candidate.title.Equals(
                                "Wuthering Heights"));

        Console.WriteLine(foundBook.title);
    }
}

The portion highlighted is the Lambda expression, and it does the same job as the delegate in the first code-block. However it does it in a way which is slightly different - it is more concise, and it's declaration is less explicit.

Let's start from the top - the Lambda operator.

The Lambda Operator
You can tell you are looking at a lambda expression when you see the lambda operator (=>). This operator can be read as "goes to":

x => x * x

So this expression (above) is read as "x goes to x times x".

How do we Get from Delegate to Lambda?
Let's explain the above expression by showing how it would look in familiar, delegate form. Then we can alter it step by step until it becomes a lamba expression again. Here's the fully written-out delegate equivalent of the above statement (below). I have written this in a way which should be familiar from my previous post on delegates:

public delegate int Square(int i);

void Main()
{
    Square square = delegate(int i)
    {
        return i*i;
    };
}

In the code-block above, the first thing to notice is that we have a named delegate type - it's name is Square. This is the first line in the code-block, the declaration that essentially says (in English) "There is a delegate type named Square, it has a single int input parameter, and an int return type."

Inside the body of the Main method, we have then instantiated an instance of Square, (named lower-case square), and assigned an anonymous delegate to it.

Step 1: Anonymise the Delegate Type
Well now, we are going to anonymise the delegate type aswell as the delegate. Anonymising means declaring the same behaviour, but without specifying a name. So in English: "There is a delegate type named Square, it has a single int input parameter, and an int return type".

Since we are no longer giving it a name, we must declare it at the same time as assigning a value to it, and therefore now the whole thing is enclosed within one statement. Remember, we are still naming the variable (lower-case square), but we are no longer naming the type (was init-capped Square):

public delegate int Square(int i);

void Main()
{
    Square square = delegate(int i)
    {
        return i*i;
    };
}

//becomes

void Main()
{
    Func<int, int> square = delegate(int i)
    {
        return i*i;
    };
}

To do this we used a type named Func<T, TResult> (see the definition here), which was introduced in C# 3.5. This class allows us to encapsulate a method as a reference-type, and pass in a couple of generic type parameters describing the argument and return type respectively: <int, int>.

You'll notice that there are several variations on Func, each with a different number of generic type parameters. You can encapsulate a method with up to 16 arguments, and there is a similar family of classes to deal with methods with no return type. Any method you assign to an instance of one of these types has to have matching input parameters and return type, and you can see in the code-block above this is the case.

Step 2: Switch to Lambda Syntax
Now let's convert the delegate on the right side of that assignment operation into a lamda expression:

Func<int, int> square = delegate(int i)
{
    return i*i;
};

//becomes

Func<int, int> square = i => i * i;

And there we have the same statement we started with. It uses the same variables to do the same thing and return the same value - but with different syntax.

The first thing you notice is how concise it is, even though it does exactly the same job. On the left side of the operator, we always have the input parameters.

  • left-of-the-operator - input parameters,
  • right-of-the-operator - expression or statement block

Remember, the operator is read as "goes to". So a lambda is always read as "input (goes to) expression", or "input (goes to) statement block".

Inferred Typing
With the delegate version we had to specify the type of the input parameter, when we said: delegate(int i). However with lambdas the compiler infers the type of i from it's usage. The compiler sees that we are assigning the expression to a strongly-typed Func object with generic type parameteres <int, int>. So it knows and doesn't need to be retold that the input parameter i in the expression is an int.

Don't make the mistake of thinking this is loosely-typed. Loose-typing is not possible in the .NET Framework. This is inferred typing, which is still very much strongly-typed. Type usage is inferred by the compiler on initialisation and cannot change. It is part of the suite of implicit-typing functionality introduced in C# 3.0, including:

In fact, one of the main benefits of lambda's, is they are a simple way to make many previously un-type-safe operations type-safe. Take a quick look at Mauricio Scheffer's blog here, where he has used lambda's to replace string literals. You see this type of usage a lot in .NET frameworks and packages nowadays.

In the first example in the first code-block of this post, without Lambdas or delegates the Find method would only be possible by passing in string literals representing the desired candidate properties. And you can also see, for example, from my post on StructureMap IoC, that statement lambdas are now the preferred initialisation approach (though the previous approach was also typesafe).

Specifying Input Parameters
If you want to override the compiler's inferred usage, you can use brackets on the left-side of the lambda operator to specify input parameters:

Func<int, int> square = (int i) => i * i;
Or, if you have no parameters to pass in, just use empty brackets:
Func<int> noParams = () => 5 * 5;

I have to admit i'm not sure yet where or when this would be useful.

Expression Lambdas vs. Statement Lambdas
So far we've only been using Expression Lambdas, which means that on the right-side of the lambda operator there is one, single expression. This is generally useful when you are trying to apply a function, such as Square above, or trying to apply a predicate, such as the List.Find example at the top of this article.

However you can construct multi-line, procedural lambdas if you wish, by using curly braces:

Action speakToWorld = n =>
{
    string s = n + " " + "World";
    Console.WriteLine(s);
};
speakToWorld("Hello");

Action is ofcourse the cousin of Func, but never returns a value (see the reference page here).

Expression Trees
One final thing to note here is the broader context within which both lambda's and delegates exist. We have moved into an arena where functionality now has the same status as value. What I mean is that we are used to assigning and manipulating values (and objects encapsulating values) within variables. Now it is just as easy to build and manipulate functionality. Expression trees are an example of that.

Earlier we used Func and Action to store our method pointers. These allow truly anonymous references to executable code. But these classes can be assembled and re-assembled in their own right, as components of an Expression object.

Have a quick look at the MSDN on Expression Trees:

"Expression trees represent code in a tree-like data structure, where each node is an expression, for example, a method call or a binary operation such as x < y."
"You can compile and run code represented by expression trees. This enables dynamic modification of executable code, the execution of LINQ queries in various databases, and the creation of dynamic queries."

A quick example from the-code-project:

ParameterExpression parameter1 = 
    Expression.Parameter(typeof(int), "x");

BinaryExpression multiply = 
    Expression.Multiply(parameter1, parameter1);

Expression<Func<int, int>> square = 
    Expression.Lambda<Func<int, int>>(multiply, parameter1);

Func<int, int> lambda = square.Compile();
Console.WriteLine(lambda(5));

In the code-block above, we have done the same thing our simple lambda expression from earlier does - square a number. It is much more verbose, yes, but look at the types on the last two lines before Console.WriteLine. We created a dynamic Expression object, which can be altered and assembled ad-hoc however you like at runtime - and then Compile()'d into executable code, and run - again, all at runtime.

In other words, you can have your code assemble more code as it runs. Or in other words again, you can have your code treat itself in the same, flexible way it used to only be able to treat data and values.

Aside from generating dynamic queries, what does this allow us to do? As this stack-overflow page points out, we can hand the power of code customisation over to users. Ever made a 'form builder', allowing users to customise post-driven dynamic webpages? I can't imagine how we did that in the past but they've just got a whole lot easier, and semantically correct. Or what about user-customised workflows? Ditto.

Further Reading

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.