#Google Analytic Tracker

Pages

Aug 4, 2009

IEnumerable + Linq - Be cautious when you do evaluation the enumerable object

From the last two posts, I emphasised that the evaluation is done each time you access the elements in enumerable object. Because of this, you have to be cautious in the two following situations.

  1. Where clause condition in Linq is modified
  2. Your collections is modified when performing Linq statement

Where Clause has changed example

// keep track on the last record highest mileage
private int _highestMileageRecorded = 10000;

private void GetStatistics(IEnumerable<Vehicle> dataSource)
{
    //Find vehicle with highest mileage compare than the record
    IEnumerable<Vehicle> vehicleExcessHighestMilage
        = from v in dataSource
          where v.Mileage > _highestMileageRecorded 
          select v;

    //Record the highest milage
    _highestMileageRecorded = 
         vehicleExcessHighestMilage.Max(v => v.Mileage);

    Console.Write("Vehicle excesses the records:");   
    foreach (var vehicle in vehicleExcessHighestMilage)
    {
        Console.Write(vehicle.Id + " ");
    }
}

The above code does not work! It will never print out any vehicle id.

The reason is that the variable _highestMileageRecorded has overwritten before I perform the foreach. Remember, this IEnumerable result depends on the current state of Linq statement.

Collection has changed example

Consider the following code: The following code try to correct vehicle that contains incorrect mileage data. Notice that a cache object return a IEnumerable of vehicle that has incorrect mileage. The cache also update itself if any vehicle data changes.

[Test]
public void Test()
{
    List<Vehicle> oVehicleList = new List<Vehicle>();
    GenerateRandomSales(oVehicleList);
    ClearIncorrectMiliage(oVehicleList);
}

private void ClearIncorrectMileage(IEnumerable<Vehicle> dataSource)
{
    VehicleCache vehicleCache = new VehicleCache();
    vehicleCache.LoadData(dataSource);

    IEnumerable<Vehicle> incorrectMileageVehicle = 
        vehicleCache.GetIncorrectMileageVehicle();
    
    foreach (var vehicle in incorrectMileageVehicle)
    {
        //Fix the data
        //Note: Exception occurs 
        vehicle.UpdateMileage(0);
    }
}

private class VehicleCache
{
    List<Vehicle> m_oIncorrectMileageList = new List<Vehicle>();

    public VehicleCache()
    {
        Vehicle.Updated += VehicleUpdated;
    }

    void VehicleUpdated(Vehicle vehicle)
    {
        if (vehicle.Mileage < 0)
            m_oIncorrectMileageList.Add(vehicle);
        else
            m_oIncorrectMileageList.Remove(vehicle);
    }

    public IEnumerable<Vehicle> GetIncorrectMileageVehicle()
    {
        return m_oIncorrectMileageList;
    }
   
    public void LoadData(IEnumerable<Vehicle> oDataSource)
    {
        m_oIncorrectMileageList.AddRange(oDataSource.Where(v => v.Mileage < 0).Select(v => v));
    }
}

If you read run this piece of code, you will most likely get a System.InvalidOperationException: Collection was modified; enumeration operation may not execute.

Because the cache try to update itself whenever the data change, it also causes trouble when the code modify the data in a foreach statement. In fact, this could easily occur in a multi-thread application where the cache get modified when other modules looping though the cache. Be sure you lock your collection, or convert the IEnumerable to an list or array before performing operations that affect the cache.

Although the above example seems obvious, it may not be obvious enough in a real application where a developer may not know other module behaviour. Imagine developer A wrote the Cache module, and you got an exception when you use its IEnumerable result. Without checking the code, you may never realized you are modifying the cache indirectly.

1 comment:

liv said...

Thanks for this update.The examples are very helpful to understand the procedure of the enumerable object.

freisprecheinrichtung