BenchmarkDotNet
BenchmarkDotNet copied to clipboard
Allow to optionally run benchmarks in Parallel
We want to be able to run thousands of benchmarks on powerful CI machines (64 cores+), it would be great to run the benchmarks in parallel.
This should not be a default behavior, and the users should be allowed to enable/disable it for selected benchmarks. For some benchmarks it's a very bad idea ;)
Good day! Any progress in this direction?
hello @bazzilic !
I am sorry but we have made no progress.
This would certainly make the .NET perf test passes easier and faster, but we would presumably need a way (attribute on method?) to indicate that a benchmark requires > 1 core (such as our parallel benchmarks). BDN would have to do a "join" then run those alone.
[More hypothetically possibly BDN could concievably have a mode where it "diagnoses" any benchmarks that seem to use significantly > 1 core when run alone, but don't have the attribute..]
Thinking laterally about a different possible approach -- is there a way to affinitize BDN and its child processes to certain # of cores? then no other feature would be required -- you'd just kick off several BDN's each on a subset of the benchmarks, each affinitized to a different CPU mask.
is there a way to affinitize BDN and its child processes to certain # of cores?
@danmoseley it's possible to affinitize the child process via --affinity and it's also possible to start the host process using some utility that allows for affinitization
Few months ago I've prototyped adding such support to BDN. The idea was simple: get all benchmarks, compile the code and start up to N-1 affinitized processes in parallel. As soon as one of them is done, pickup next available benchmark and run it. Don't report the results by writing to std out, provide each process the path to file where it should store the results.
https://github.com/dotnet/BenchmarkDotNet/compare/c165ba17501626561297a80fe1b05b2400ce8014...9e2d5c986f434b82aa20b37b7ecda2376a059fec
It was very promising:

On my 24-core machine the time to run dotnet/performance benchmarks dropped from 9h 49m to half an hour ;)
I would need some extra time to test it (how sharing CPU cache affects the benchmark results) and polish it (presumably run them only on physical cores (no hyper threading). I expect that it would take me a week of work.
On my 24-core machine the time to run dotnet/performance benchmarks dropped from 9h 49m to half an hour ;)
Wow! What about perf tests that need more than one core, though? Eg Parallel.For, etc
Also what about in-process benchmarks? I know allocations results will be affected, but I'm not sure what else might be.
And what about P-cores vs E-cores on Intel cpus (actually, are those even handled now)?
On my 24-core machine the time to run dotnet/performance benchmarks dropped from 9h 49m to half an hour ;)
Wow! What about perf tests that need more than one core, though? Eg Parallel.For, etc
I think that could be solved by a int MaxThreads property on the Benchmark attribute that shows how many threads the benchmark needs. Default to 1, 0 means it should not be ran in parallel with other benchmarks (perhaps the parallelism is dynamic based on parameters).
What about perf tests that need more than one core, though? Eg Parallel.For, etc
We could extend the [BenchmarkAttribute] with a property that disables parallelization for certain benchmarks (they would be executed sequentially at the end of benchmark run).
Also what about in-process benchmarks?
That would not be supported at the beginning, but could be implemented by some contributors by:
- switching from
GC.GetTotalAllocatedBytestoGC.GetAllocatedBytesForCurrentThreadforMemoryDiagnoser(it supports both) - using
ProcessThread.ProcessorAffinityon Windows to affinitize the threads andpthread_setaffinity_npsys-call on Linux
@adamsitnik The prototype seems very promising! I'm hoping this feature makes it in at some point.