#Google Analytic Tracker

Pages

Jan 14, 2009

Counting Items using Linq vs IEnumable Looping Performance

What is Linq?

If you happened to be a .NET developer and bumped into this blog somehow, you probably already know a bit or heard about Linq is about. Linq, or Language INtegrated Query, is an set of extension to the .NET Framework.  It provides the query-liked syntax and class libraries for querying data. There are many information about Linq, so I won't go further on what Linq is about and how awesome it is.

IEnumerable

There are many collection classes you can find inside .NET Framework.

Eamples:

System.Collections ArrayList
  Queue
  SortedList
  Stack
System.Collections.Generic Dictionary<TKey, TValue>
  List<T>
System.Core HashSet<T>

In the end, you will find all these collection supports IEnumerable, or IEnumerable<T>. These interfaces only have one method: GetEnumerator(), and this method returns System.Collections.IEnumerator or System.Collections.Generic.IEnumerator<T> respectively.

IEnumerator

An IEnumerator object allows you to walk though the collection one by one. It has these methods:

    • MoveNext()
    • Reset()

and a property calls Current

After you get a IEnumerator object, you can call the MoveNext() and get the item from the collection from the Current property.

How to Count

When counting the number of items in a collection using IEnumerator, you could do the following:

[Test]
public void CountTest_UsingIEnumerator()
{
    IEnumerator l_oEnumerator = m_aSampleData.GetEnumerator();
    
    int l_nItemCount = 0;
    while(l_oEnumerator.MoveNext())
    {
        l_nItemCount++;
    }
    
    Trace.WriteLine(string.Format("Number of item in the array: {0}", l_nItemCount));
}

Of course, that is pretty stupid and looks very inefficient, because we could have just do the following:

[Test]
public void CountTest_UsingLength()
{
    int l_nItemCount = m_aSampleData.Length;
    Trace.WriteLine(string.Format("Number of item in the array {0}", l_nItemCount));
}

For an array, there is a length property, you could have just access this property to get the size. However, with Linq, you can also do this too:

[Test]
public void CountTest_UsingLinq()
{
    int l_nItemCount = ((IEnumerable<int>)m_aSampleData).Count();
    Trace.WriteLine(string.Format("Number of item in the array {0}", l_nItemCount));
}

Casting to IEnumerable isn't necessary, but I just want to show you my point that the .Count() is based on the interface.

Count Performance

During a code review, my collegue was concerned that calling Linq Count() method would be a lot slower than just access the Length property directly. When you look at Linq's Count() method, you will notice that it is an extension method. The signature:
public static int Count<TSource>(this System.Collections.Generic.IEnumerable<TSource> source)
indicates the method takes in a IEnumerable<T>, which means it only has 2 method, MoveNext() and Reset.

Does that mean calling the extension Count() is slower then getting the value from the .Length property?

While let do some tests:

Performance Test Using Length Property

[Test]
public void CountPerformanceTest_UsingLength()
{
    int l_oQueryCount = 1000000;

    long l_nStartTick = DateTime.Now.Ticks;
    for(int i=0; i< l_oQueryCount ; i++)
    {
        int l_nCount = m_aSampleData.Length;
    }
    long l_nEndTick = DateTime.Now.Ticks;

    Trace.WriteLine(string.Format("It takes {0} to count {1} times using .Length property.", new TimeSpan(l_nEndTick - l_nStartTick), l_oQueryCount));
}

Performance Test Using Linq Count

[Test]
public void CountPerformanceTest_UsingLinq()
{
    int l_oQueryCount = 1000000;

    long l_nStartTick = DateTime.Now.Ticks;
    for (int i = 0; i < l_oQueryCount; i++)
    {
        int l_nCount = m_aSampleData.Count();
    }
    long l_nEndTick = DateTime.Now.Ticks;

    Trace.WriteLine(string.Format("It takes {0} to count {1} times using .Count() method.", new TimeSpan(l_nEndTick - l_nStartTick), l_oQueryCount));
}

The results:

  • It takes 00:00:00 to count 1000000 times using .Length property.
  • It takes 00:00:00.4218723 to count 1000000 times using .Count() method.

So, pretty much using .Length that no time, or it could be optimized by the compiler or the CPU optimized its execution path. However, calling the .Count() takes a bit of overhead, compare accessing the Length property. What about if I loop though the data and count it?

Performance Test By Looping using IEnumerator

[Test]
public void CountPerformanceTest_UsingEnumerator()
{
    int l_oQueryCount = 10000;
    IEnumerator l_oEnumerator = m_aSampleData.GetEnumerator();

    long l_nStartTick = DateTime.Now.Ticks;
    for (int i = 0; i < l_oQueryCount; i++)
    {

        int l_nItemCount = 0;
        while (l_oEnumerator.MoveNext())
        {
            l_nItemCount++;
        }

        l_oEnumerator.Reset();
    
    }
    long l_nEndTick = DateTime.Now.Ticks;

    Trace.WriteLine(string.Format("It takes {0} to count {1} times by looping though item using the Enumerator", new TimeSpan(l_nEndTick - l_nStartTick), l_oQueryCount));
}

The results:

  • It takes 00:00:07.9843239 to count 10000 times by looping though item using the Enumerator. Note that the test takes more than 7 sec just to query 10,000 times compare with query 0.4 sec for 1,000,000 times using Linq.

Conclusion

So, it appears that the Linq Count() does something else , because we know that looping though the array is extreme slow. So I decided to disassemble the Count() code using reflector:

public static int Count<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    ICollection<TSource> is2 = source as ICollection<TSource>;
    if (is2 != null)
    {
        return is2.Count;
    }
    int num = 0;
    using (IEnumerator<TSource> enumerator = source.GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            num++;
        }
    }
    return num;
}

Ah,it turns out that Count() takes into account of a ICollection. An array "[]" is basically an array object after it is boxed. Array's base type is IList and ICollection. That means its performance is fairly comparable to just access the Length property. Therefore, you shouldn't be too worry about using the Linq Count(). Linq also some with other methods such as First() and Last(). It makes reading the code much easier when want using meaning method name instead of accessing the item though index. For example, arrayA[arrayA -1] vs arrayA.Last().

 

1 comment:

Adam Tuliper said...

not to mention linq can re-execute queries again when you call count() such as can be noted here
Adam Tuliper
completedevelopment.blogspot.com


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LinqTest
{
class TestClass
{
public TestClass()
{
CreateDate = DateTime.Now;
}
public DateTime CreateDate;
}

class Program
{

static void Main(string[] args)
{
//Populate the test class
List list = new List(1000);
for (int i=0; i < 1000; i++)
{
System.Threading.Thread.Sleep(20);
list.Add(new TestClass());
if(i%100==0)
{
Console.WriteLine(i.ToString() + " items added");
}
}

//now query for items
var newList = list.Where(o=> o.CreateDate.AddSeconds(5)> DateTime.Now);
while (newList.Count() > 0)
{
//Note - are actual count keeps decreasing.. showing our 'execute' is running every time we call count.
Console.WriteLine(newList.Count());
System.Threading.Thread.Sleep(500);
}
}
}
}