Daily Knowledge Drop
Linq has built in parallel functionality, available in the ParallelEnumerable
class. This functionality is available as extension methods on ParallelQuery
, which is generated by calling AsParallel
on an IEnumerable.
Usage
The usage of the functionality offered by ParallelEnumerable
is very similar (and for the most part, the same) to normal Linq method usage.
The access the root of the parallel functionality,ParallelQuery
, all that is required is the AsParallel()
method to be called on an IEnumerable:
IEnumerable<int> array = Enumerable.Range(1, 1000);
ParallelQuery<int> parallelQuery = array.AsParallel();
With the ParallelQuery instance, traditional Linq methods can still available:
// output: 750,5
int average = parallelQuery
.Where(i => i > 500)
.Average();
However, there is now also additional methods available, for example ForAll
which will perform an Action on each element, but in parallel:
var outputArray = new int[1000];
parallelQuery.ForAll(i =>
{
outputArray[i - 1] = i * i;
});
Let's benchmark this to see how it performs against other parallel and not parallel operations which achieve the same outcome.
Benchmark
In this set of benchmarks, we are operating on a IEnumerable<int> with 100 items, and for each item in the source, the PerformCalculation method will be called:
// the source items
IEnumerable<int> array = Enumerable.Range(1, 100);
// the method invoked
void PerformCalculation(int i)
{
_ = i * i * i;
}
The following techniques were benchmarked:
List.ForEach:
public void ForEachList() { array.ToList() .ForEach(i => PerformCalculation(i)); }
Foreach over an Array:
public void ForeachOverArray() { foreach (var i in array.ToArray()) { PerformCalculation(i); } }
ParallelForEach:
public void ParallelForEach() { Parallel.ForEach(array, PerformCalculation); }
ParallelQuery.ForAll:
public void ParallelQueryForAll() { array .AsParallel() .ForAll(i => PerformCalculation(i)); }
The results:
Method | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Allocated |
---|---|---|---|---|---|---|---|---|
ListForEach | 322.85 ns | 5.651 ns | 5.286 ns | 1.00 | 0.00 | 0.0825 | - | 520 B |
ForeachOverArray | 91.40 ns | 1.770 ns | 1.656 ns | 0.28 | 0.01 | 0.0675 | - | 424 B |
ParallelQueryForAll | 13,481.85 ns | 172.827 ns | 161.663 ns | 41.77 | 0.86 | 2.0599 | 0.0610 | 12,585 B |
ParallelForEach | 7,978.07 ns | 41.364 ns | 34.540 ns | 24.70 | 0.43 | 4.6844 | 0.1068 | 24,175 B |
The parallel versions are slow
- VERY slow compared to just iterating over the collection of items in sequence.
Surely doing processing in parallel should make things process quicker overall? Generally yes - in this case however, the actual work being performed (the PerformCalculation method) is done so quickly, that the overhead of creating and managing the parallel tasks is a detriment to performance
vs just operating on the items in sequence.
However, what if the work being performed took slightly longer....
Benchmark v2
We'll run the exact same benchmarks, with the same array size, but now the PerformCalculation method will take 2ms longer
:
void PerformCalculation(int i)
{
_ = i * i * i;
// simulate a longer
// running process
Thread.Sleep(2);
}
The results of round 2:
Method | Mean | Error | StdDev | Ratio | Allocated |
---|---|---|---|---|---|
ListForEach | 1,536.90 ms | 2.679 ms | 2.506 ms | 1.00 | 1 KB |
ForeachOverArray | 1,534.54 ms | 2.702 ms | 2.528 ms | 1.00 | 1 KB |
ParallelQueryForAll | 152.84 ms | 0.730 ms | 0.683 ms | 0.10 | 13 KB |
ParallelForEach | 71.86 ms | 7.165 ms | 20.788 ms | 0.07 | 99 KB |
The parallel versions are now 10-20 times faster!
. The Parallel.ForEach is twice as fast
as the ParallelEnumerable.ForAll, but uses 7 times the amount of memory
(and both parallel techniques use vastly more memory in comparison to the basic loops)
Notes
There are a number of different techniques to solve a use case such as this - each with its own pros and cons, which would need to be evaluated for each use case. It's important to know about each technique, so a thorough evaluation can be done. Results may vary based on collection size, and process time, but based on the above use case:
Low memory usage
is most important? - use a foreach loopFast throughput
is most important? - use Parallel.ForEachGood throughput with good memory
usage? - use ParallelEnumerable.ForAll
References
Daily Drop 140: 17-08-2022
At the start of 2022 I set myself the goal of learning one new coding related piece of knowledge a day.
It could be anything - some.NET / C# functionality I wasn't aware of, a design practice, a cool new coding technique, or just something I find interesting. It could be something I knew at one point but had forgotten, or something completely new, which I may or may never actually use.
The Daily Drop is a record of these pieces of knowledge - writing about and summarizing them helps re-enforce the information for myself, as well as potentially helps others learn something new as well.On This Page