#Google Analytic Tracker

Pages

Aug 4, 2009

IEnumerable + Linq - Be cautious when you do evaluation the enumerable object

From the last two posts, I emphasised that the evaluation is done each time you access the elements in enumerable object. Because of this, you have to be cautious in the two following situations.

  1. Where clause condition in Linq is modified
  2. Your collections is modified when performing Linq statement

Where Clause has changed example

// keep track on the last record highest mileage
private int _highestMileageRecorded = 10000;

private void GetStatistics(IEnumerable<Vehicle> dataSource)
{
    //Find vehicle with highest mileage compare than the record
    IEnumerable<Vehicle> vehicleExcessHighestMilage
        = from v in dataSource
          where v.Mileage > _highestMileageRecorded 
          select v;

    //Record the highest milage
    _highestMileageRecorded = 
         vehicleExcessHighestMilage.Max(v => v.Mileage);

    Console.Write("Vehicle excesses the records:");   
    foreach (var vehicle in vehicleExcessHighestMilage)
    {
        Console.Write(vehicle.Id + " ");
    }
}

The above code does not work! It will never print out any vehicle id.

The reason is that the variable _highestMileageRecorded has overwritten before I perform the foreach. Remember, this IEnumerable result depends on the current state of Linq statement.

Collection has changed example

Consider the following code: The following code try to correct vehicle that contains incorrect mileage data. Notice that a cache object return a IEnumerable of vehicle that has incorrect mileage. The cache also update itself if any vehicle data changes.

[Test]
public void Test()
{
    List<Vehicle> oVehicleList = new List<Vehicle>();
    GenerateRandomSales(oVehicleList);
    ClearIncorrectMiliage(oVehicleList);
}

private void ClearIncorrectMileage(IEnumerable<Vehicle> dataSource)
{
    VehicleCache vehicleCache = new VehicleCache();
    vehicleCache.LoadData(dataSource);

    IEnumerable<Vehicle> incorrectMileageVehicle = 
        vehicleCache.GetIncorrectMileageVehicle();
    
    foreach (var vehicle in incorrectMileageVehicle)
    {
        //Fix the data
        //Note: Exception occurs 
        vehicle.UpdateMileage(0);
    }
}

private class VehicleCache
{
    List<Vehicle> m_oIncorrectMileageList = new List<Vehicle>();

    public VehicleCache()
    {
        Vehicle.Updated += VehicleUpdated;
    }

    void VehicleUpdated(Vehicle vehicle)
    {
        if (vehicle.Mileage < 0)
            m_oIncorrectMileageList.Add(vehicle);
        else
            m_oIncorrectMileageList.Remove(vehicle);
    }

    public IEnumerable<Vehicle> GetIncorrectMileageVehicle()
    {
        return m_oIncorrectMileageList;
    }
   
    public void LoadData(IEnumerable<Vehicle> oDataSource)
    {
        m_oIncorrectMileageList.AddRange(oDataSource.Where(v => v.Mileage < 0).Select(v => v));
    }
}

If you read run this piece of code, you will most likely get a System.InvalidOperationException: Collection was modified; enumeration operation may not execute.

Because the cache try to update itself whenever the data change, it also causes trouble when the code modify the data in a foreach statement. In fact, this could easily occur in a multi-thread application where the cache get modified when other modules looping though the cache. Be sure you lock your collection, or convert the IEnumerable to an list or array before performing operations that affect the cache.

Although the above example seems obvious, it may not be obvious enough in a real application where a developer may not know other module behaviour. Imagine developer A wrote the Cache module, and you got an exception when you use its IEnumerable result. Without checking the code, you may never realized you are modifying the cache indirectly.

Did you increase or decrease your application performance using IEnumerable in Linq (Part 2)?

Last time I talked about how you can improve your application performance by using IEnumerable so that no calculation is performed until you need it.

Bad Performance

Now, consider the following code:

private void GetStatistics(List<Vehicle> totalSaleList)
{
    IEnumerable<Vehicle> toyota
        = from v in totalSaleList
          where v.Make == Make.Toyota
          select v;

    int totalSales = toyota.Count();

    int thisYearSale = 
        (from v in toyota
         where v.Year == 2009
         select v).Count();

    double avgMilage = 
        (from v in toyota
         select v.Millage).Average();

    Console.WriteLine("Total Sales: {0}. This year sale: {1}. Average milage: {2}", totalSales, thisYearSale, avgMilage);
}

Notice that in the above example, I kept the variable "toyota" as an Enumerable. When I try to calculate the “totalSales”, “thisYearSale” and “avgMilage”, the querying always need to re-evaluate the first Linq statement. This is obvious a waste of cpu power. 

Better Performance

Ideally, you should do the following:

private void GetStatistics2(List<Vehicle> totalSaleList)
{
    Vehicle[] toyota
        = (from v in totalSaleList
          where v.Make == Make.Toyota
          select v).ToArray();

    int totalSales = toyota.Length;

    int thisYearSale =
        (from v in toyota
         where v.Year == 2009
         select v).Count();

    double avgMileage =
        (from v in toyota
         select v.Mileage).Average();

    Console.WriteLine("Total Sales: {0}. This year sale: {1}. Average mileage: {2}", totalSales, thisYearSale, avgMileage);
}

By converting the variable “toy ota” into an array, the above code does not need to re-evaluate “toyota” each time when we access the enumerable.

Even Better Performance

If performance is a must, you should try to combine calculations in as less number of loop as possible. Remember, each time you call Linq extension method (i.e. Max(), Min(), Average()), it actually has to loop though your enumerable object to calculate the result.

private void GetStatistics3(List<Vehicle> totalSaleList)
{
    Vehicle[] toyota
        = (from v in totalSaleList
           where v.Make == Make.Toyota
           select v).ToArray();

    int totalSales = toyota.Length;

    int thisYearSale = 0;
    int totalMileage = 0;
    foreach (var vehicle in toyota)
    {
        if (vehicle.Year == 2009)
            thisYearSale++;
        totalMileage += vehicle.Mileage;
    }

    double avgMilage = totalMileage * 1.0 / totalSales;

    Console.WriteLine("Total Sales: {0}. This year sale: {1}. Average milage: {2}", totalSales, thisYearSale, avgMilage);
}

Conclusion

When you need to performance more calculation on the same enumerable object, it would be wise to convert it to a list or an array first before further processing.  This will reduce the time of re-evaluate the enumerable result.

Did you increase or decrease your application performance using IEnumerable in Linq (Part 1)?

I love Linq, there is no doubt about that. I pretty much use Linq whenever is appropriate. Linq simply makes my code much more readable and easier to understand. Another reason is the power the extension methods the Linq library provide for IEnumerable<T> object.

In an older post, I mentioned a bit about IEnumerable and how it works. This time, I am going to talk a bit more about how you can use IEnumerable to help improve your application.

Good Performance using IEnumerable?

Consider the following code:

private bool IsToyotaTheHighestSale2(List<Vehicle> totalSaleList)
{
    int toyotaSale =  (from v in totalSaleList
          where v.Make == Make.Toyota
          select v).Count();

    int hondaSale = (from v in totalSaleList
          where v.Make == Make.Honda
          select v).Count();

    int bmwSale = (from v in totalSaleList
          where v.Make == Make.BMW
          select v).Count();

    //Compare the result
    if (toyotaSale > hondaSale)
        if (toyotaSale > bmwSale)
            return true;

    return false;
}

In the above example, we use Linq to get a count of all the car sales, follow by compare toyta sale with honda and bmw sale.

Now, let check the following:

private bool IsToyotaTheHighestSale2(List<Vehicle> totalSaleList)
{

    IEnumerable<Vehicle> toyota 
        = from v in totalSaleList
          where v.Make == Make.Toyota
          select v;

    IEnumerable<Vehicle> honda 
        = from v in totalSaleList
          where v.Make == Make.Honda
          select v;

    IEnumerable<Vehicle> bmw 
        = from v in totalSaleList
          where v.Make == Make.BMW
          select v;

    //Count all the list first
    int toyotaSale = toyota.Count();

    //Compare the result
    if (toyotaSale > honda.Count())
        if (toyotaSale > bmw.Count())
            return true;

    return false;
}

Comparing IsToyotaTheHighestSale() with IsToyotaTheHighestSale2(), which one would perform faster? The answer is the IsToyotaTheHighestSale2().

Why? When you use Linq and get an IEnumerable there is actually no calculation, until you act on the IEnumerable. In IsToyotaTheHighestSale2(), if toyota sale is not higher than honda sale, the function does not need to calculate bmw sale. This is why the latter function run faster in general case.

Of course, above is only a very simple example. Imagine that you need to execute a function tens of thousands of times, and your Linq statement is a lot more complicated than the above example. Acting on the Enumberable later in the logic would save you running unnecessary code.

Conclusion

You can improve your application performance by acting on the IEnumerable object right when you need it, instead of getting the result in the early stage of your logic.

However, there are another cases when you should not use IEnumerable, which I will cover in the next blog.

Aug 3, 2009

PreCode Setup Testing

This blog is merely my PreCode setup test. It took me a long time to setup the PreCode because I did realized the preview function on Blogger, and the Live Writer Preview does not execute JavaScript. I kept wondering why the code setup doesn’t work until I actually try to publish a post.

Here are two examples on how the Syntax Highighter works:

JavaScript Example:

 function doNothing()
 {
   var a = 1;
   var b = 2;
   var nothing = a*b;
   // alert(nothing);
 }

C# Example:

public static string GetHelloMessage()
{
    var a = "Hello";
    var b = "World";
    
    var someArray = (from p in SomeArray
                     where p is Dat
                     select p).ToArray();

    return string.Format("{0} {1}", a, b);
}

What is PreCode

Precode is a code snippet highlighter plug-in for Windows Live Writer. It uses an open source JavaScript syntax highlighter that format your code in HTML.

This plug-in works by adding a tag <pre> for the code that you want to insert.

http://www.codeplex.com/precode

Screenshot:

image

After you insert your code, you will end up with something like the following:

<pre class="brush: csharp; gutter: false; toolbar: false; smart-tabs: false;">public static string GetHelloMessage()
{
    var a = &quot;Hello&quot;;
    var b = &quot;World&quot;;
    
    var someArray = (from p in SomeArray
                     where p is Dat
                     select p).ToArray();

    return string.Format(&quot;{0} {1}&quot;, a, b);
}</pre>

Since Precode uses Javascript, when you add test your setup in Windows Live Writer, or Blogger, it won’t show the formatted text. You will have to publish your blog to see if it works or not.

Note: I couldn’t get the PreCode to work properly in WindowsXP. The “Surround With” dropdown appears behind the main form. Maybe this is a WPF issue, but who knows, unless someone look into this.

How to set your Blogger to use PreCode

There is two parts:

1. Download and install PreCode

2. Setup your Syntax Highlighter

It took me awhile to get the JavaScript right. I am not an expert in JavaScript. However, this is what I did:

In the blogger settings, goto Layout Tab –> Edit HTML. In your Edit Template, add the following code in your html <head> section and save your template:

<script language='JavaScript' src='http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js'>
</script>
<link href='http://alexgorbatchev.com/pub/sh/2.0.320/styles/shCore.css' rel='stylesheet' type='text/css'/>
<link href='http://alexgorbatchev.com/pub/sh/2.0.320/styles/shThemeDefault.css' rel='stylesheet' type='text/css'/>
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/2.0.320/scripts/shCore.js'/>
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/2.0.320/scripts/shBrushCSharp.js'/>
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/2.0.320/scripts/shBrushXml.js'/>
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/2.0.320/scripts/shBrushAS3.js'/>
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/2.0.320/scripts/shBrushJScript.js'/>
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/2.0.320/scripts/shBrushCss.js'/>
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/2.0.320/scripts/shBrushCpp.js'/>
<script language='javascript' src='http://alexgorbatchev.com/pub/sh/2.0.320/scripts/shBrushJava.js'/>

<script language='javascript'>
$(document).ready(function () {
  $(&quot;pre br&quot;).after(&quot;\n&quot;).remove();
  SyntaxHighlighter.config.clipboardSwf = &#39;http://alexgorbatchev.com/pub/sh/2.0.320/scripts/clipboard.swf&#39;;
  SyntaxHighlighter.all();
});
</script>

Notice that in the above Javascript, I am referencing the js files from alexgorbatchev.com domain. You can definitely put these js files somewhere else. In fact, you can modify these javascript files to have additional keywords.  In the above example, I didn’t include all the available syntax highlighters. Go to the following link for the others.

http://alexgorbatchev.com/wiki/SyntaxHighlighter:Brushes 

References

Thanks to the following links, I was able to set up my blog using Precode:

http://www.codeplex.com/precode

http://ersinbasaran.blogspot.com/2009/07/code-syntax-highligthting-on-bloggercom.html