Parallel processing with ParallelEnumerable

Built in Linq parallel processing with ParallelEnumerable

Home DailyDrop

Daily Knowledge Drop

Linq has built in parallel functionality, available in the ParallelEnumerable class. This functionality is available as extension methods on ParallelQuery, which is generated by calling AsParallel on an IEnumerable.


Usage

The usage of the functionality offered by ParallelEnumerable is very similar (and for the most part, the same) to normal Linq method usage.

The access the root of the parallel functionality,ParallelQuery, all that is required is the AsParallel() method to be called on an IEnumerable:

IEnumerable<int> array = Enumerable.Range(1, 1000);
ParallelQuery<int> parallelQuery = array.AsParallel();

With the ParallelQuery instance, traditional Linq methods can still available:

// output: 750,5
int average = parallelQuery
    .Where(i => i > 500)
    .Average(); 

However, there is now also additional methods available, for example ForAll which will perform an Action on each element, but in parallel:

var outputArray = new int[1000];
parallelQuery.ForAll(i =>
{
    outputArray[i - 1] = i * i;
});

Let's benchmark this to see how it performs against other parallel and not parallel operations which achieve the same outcome.


Benchmark

In this set of benchmarks, we are operating on a IEnumerable<int> with 100 items, and for each item in the source, the PerformCalculation method will be called:

// the source items
IEnumerable<int> array = Enumerable.Range(1, 100);

// the method invoked
void PerformCalculation(int i)
{
    _ = i * i * i;
}

The following techniques were benchmarked:

  • List.ForEach:

    public void ForEachList()
    {
        array.ToList()
           .ForEach(i => PerformCalculation(i));
    }
    
  • Foreach over an Array:

    public void ForeachOverArray()
    {
        foreach (var i in array.ToArray())
        {
            PerformCalculation(i);
        }
    }
    
  • ParallelForEach:

    public void ParallelForEach()
    {
        Parallel.ForEach(array, PerformCalculation);
    }
    
  • ParallelQuery.ForAll:

    public void ParallelQueryForAll()
    {
        array
        .AsParallel()
        .ForAll(i => PerformCalculation(i));
    }
    

The results:

Method Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Allocated
ListForEach 322.85 ns 5.651 ns 5.286 ns 1.00 0.00 0.0825 - 520 B
ForeachOverArray 91.40 ns 1.770 ns 1.656 ns 0.28 0.01 0.0675 - 424 B
ParallelQueryForAll 13,481.85 ns 172.827 ns 161.663 ns 41.77 0.86 2.0599 0.0610 12,585 B
ParallelForEach 7,978.07 ns 41.364 ns 34.540 ns 24.70 0.43 4.6844 0.1068 24,175 B

The parallel versions are slow - VERY slow compared to just iterating over the collection of items in sequence.

Surely doing processing in parallel should make things process quicker overall? Generally yes - in this case however, the actual work being performed (the PerformCalculation method) is done so quickly, that the overhead of creating and managing the parallel tasks is a detriment to performance vs just operating on the items in sequence.

However, what if the work being performed took slightly longer....


Benchmark v2

We'll run the exact same benchmarks, with the same array size, but now the PerformCalculation method will take 2ms longer:

void PerformCalculation(int i)
{
    _ = i * i * i;
    // simulate a longer 
    // running process
    Thread.Sleep(2);
}

The results of round 2:

Method Mean Error StdDev Ratio Allocated
ListForEach 1,536.90 ms 2.679 ms 2.506 ms 1.00 1 KB
ForeachOverArray 1,534.54 ms 2.702 ms 2.528 ms 1.00 1 KB
ParallelQueryForAll 152.84 ms 0.730 ms 0.683 ms 0.10 13 KB
ParallelForEach 71.86 ms 7.165 ms 20.788 ms 0.07 99 KB

The parallel versions are now 10-20 times faster!. The Parallel.ForEach is twice as fast as the ParallelEnumerable.ForAll, but uses 7 times the amount of memory (and both parallel techniques use vastly more memory in comparison to the basic loops)


Notes

There are a number of different techniques to solve a use case such as this - each with its own pros and cons, which would need to be evaluated for each use case. It's important to know about each technique, so a thorough evaluation can be done. Results may vary based on collection size, and process time, but based on the above use case:

  • Low memory usage is most important? - use a foreach loop
  • Fast throughput is most important? - use Parallel.ForEach
  • Good throughput with good memory usage? - use ParallelEnumerable.ForAll

References

ParallelEnumerable Class


Daily Drop 140: 17-08-2022

At the start of 2022 I set myself the goal of learning one new coding related piece of knowledge a day.
It could be anything - some.NET / C# functionality I wasn't aware of, a design practice, a cool new coding technique, or just something I find interesting. It could be something I knew at one point but had forgotten, or something completely new, which I may or may never actually use.

The Daily Drop is a record of these pieces of knowledge - writing about and summarizing them helps re-enforce the information for myself, as well as potentially helps others learn something new as well.
c# .net enumerable parallel