BenchmarkDotNet
BenchmarkDotNet copied to clipboard
Wishlist of features I'd find useful
- [x] Filter on benchmark parameters (say I only want to run
System.Text.Json.Tests.Perf_Basic.WriteBasicUtf16(Formatted: False, SkipValidation: False, DataSize: 100000), not the other 5 flavors) - [x] Ability to specify sets of benchmarks in filters (say run both
Bench1.AandPerf2.B) - [ ] Friendly names for
--coreruns in reports - [ ] Multiple groups of
--envVarthat are treated as different run configs - [ ] Friendly names for the
--envVargroups in reports - [x] Mix
--corerunand--netX.Yon one command line (#2002) - [ ] VTune diagnoser or similar that uses VTune API to mark actual measurement intervals
- [ ] Better integration with linux
perf(see notes below) - [ ] When comparing two runtimes/coreruns/etc, use the same invocation count/iteration count for both runs so the same amount of work is being measured (and maybe the same warmup/overhead/etc so everything is more or less equivalent)
- In
--corerunmode with two coreruns, use the same exact execution strategy for each.
- when passing multiple
coreruns, run them and list them in the results table in the order specified on the command line. Right now they seem to be listed in the table in alphabetical order.
eg
dotnet run -c Release -f net6.0 -- --filter System.Numerics.Tests.Perf_BitOperations.PopCount_ulong --corerun D:\bugs\osr-perf\main-rel\corerun.exe D:\bugs\osr-perf\osr-rel\corerun.exe D:\bugs\osr-perf\hack-rel\corerun.exe
gives the following table
| Method | Job | Toolchain | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PopCount_ulong | Job-MTRLJC | \hack-rel\corerun.exe | 464.7 ns | 6.25 ns | 5.84 ns | 465.7 ns | 450.6 ns | 471.7 ns | 1.39 | 0.02 | - | NA |
| PopCount_ulong | Job-WAUWEH | \main-rel\corerun.exe | 333.5 ns | 4.70 ns | 4.16 ns | 334.8 ns | 324.5 ns | 339.7 ns | 1.00 | 0.00 | - | NA |
| PopCount_ulong | Job-LMIYYB | \osr-rel\corerun.exe | 345.2 ns | 8.60 ns | 9.21 ns | 347.0 ns | 324.6 ns | 360.4 ns | 1.04 | 0.03 | - | NA |
Merged a PR addressing the first point:
https://github.com/dotnet/performance/pull/2314
To use the filter, you can use the format
dotnet run -c Release -f net7.0 --filter *Perf_Basic* --parameter-filter SkipValidation:True DataSize:10
when doing a command line run from the usual directory.
Nice! Looking forward to using it!
Ability to specify sets of benchmarks in filters
--filter accepts multiple strings. Example:
--filter Bench1.A Perf2.B
Friendly names for --coreruns in reports
This should be easy to implement, once we have an idea how to expose it via command line args.
My current idea is:
--corerun path1 path2 --corerun-names name1 name2
but it's far from ideal.
Multiple groups of --envVar that are treated as different run configs
We could achieve that by introducing some new "separators" to --envVar.
Currently we have:
--envVars ENV_VAR_KEY_1:value_1 ENV_VAR_KEY_2:value_2
We could do sth like:
--envVars ENV_VAR_KEY_1:value_1 $magicSeparator ENV_VAR_KEY_2:value_2
Friendly names for the --envVar groups in reports
It's same as with --corerun friendly names. How should this be exposed from cmd line arg level?
Mix --corerun and --netX.Y on one command line
Currently --runtime x combined with --corerun y z means build as x and run using y and z. I remember that we used it a while ago as a workaround for some dotnet/runtime limitation (iirc dotnet/runtime had an old SDK and it could not build new benchmarks from dotnet/performance that were using new APIs). We could change the meaning of it to: build as current (-f) moniker, run as x, y and z. iirc @stephentoub asked me for that in the past.
VTune diagnoser or similar that uses VTune API to mark actual measurement intervals
this sounds very interesting. Do you have any links to VTune API docs?
I am going to transfer this issue to BDN repo as all the feature requests are BDN feature requests.
iirc @stephentoub asked me for that in the past.
Yup: https://github.com/dotnet/BenchmarkDotNet/issues/1774
Would like to see this get done: https://github.com/dotnet/BenchmarkDotNet/issues/1634 :( So useful to have custom names for parameters of complex types.
I also find myself wishing there was simpler/smoother integration with linux perf. Exporting to perfview is ok for CPU samples but for HW counters it's not really viable.
Something along these lines:
- create a
perfdiagnoser that runs the benchmark subprocess underperf reportorperf stat, allowing me to specify the events of interest (often PMU events) - Enables perf map under the covers
- post processing of the data to inject markers/events for the actual intervals so that later on we can filter the reporting to just those stretches of time (using perhaps the switch on/off capabilities)
- post processing using
perf inject -jto add in the mappings for jitted code ranges - continued effort to both describe all the runtime stubs in the perf mappings and (where possible) make unwinding work through all of the stubs somehow so the call stack modes aren't badly broken (would also help perfview)
- Similar injection of runtime events, eg a mashup of
-p EPwith the above
Right now I am running perf record over the entire BDN invocation and either boosting the iteration/invocation counts so that the actual intervals clearly dominate everything else, or slicing and looking at only the last 10% (say) of the recorded data.
VTune diagnoser or similar that uses VTune API to mark actual measurement intervals
this sounds very interesting. Do you have any links to VTune API docs?
https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis.html
Related: I'd like to see --join fixed so multiple filter expression results can all show up in a single results table: https://github.com/dotnet/performance/issues/1855
It would also be nice to have an integrated diagnoser for ETW that is benchmark interval aware. I have a crude start at this in https://github.com/AndyAyersMS/instructionsretiredexplorer; it can post-process the ETW (actual interval aware) and project onto managed method names & tiering variants. eg
Mining ETL from D:\bugs\r72730\BenchmarkDotNet.Artifacts\LargeRegexTest.Generated-20220725-132937.etl for process corerun
PMC interval now 10000
Found process [9716] corerun: "D:\bugs\r72730\48b85438-13c4-4c73-94b5-b109ce10b9d2\corerun.exe" 150360d1-1148-4e51-8848-28e3c3c32196.dll --benchmarkName LargeRegexTest.Generated --job Toolchain=CoreRun --benchmarkId 0
==> benchmark process is [9716]
Samples for corerun: 16277 events for Benchmark Intervals
Jitting : 01.44% 1.25E+06 samples 1554 methods
JitInterface : 00.18% 1.6E+05 samples
Jit-generated code: 96.84% 8.4E+07 samples
Jitted code : 96.84% 8.4E+07 samples
MinOpts code : 00.00% 0 samples
FullOpts code : 00.00% 0 samples
Tier-0 code : 87.98% 7.63E+07 samples
Tier-1 code : 08.86% 7.69E+06 samples
R2R code : 00.00% 0 samples
00.47% 4.1E+05 ? Unknown
87.98% 7.632E+07 Tier-0 [r72730]<RegexGenerator_g>F7__GetAsmInstructionsRegex_0+RunnerFactory+Runner.TryMatchAtCurrentPosition(value class System.ReadOnlySpan`1<wchar>)
01.38% 1.2E+06 Tier-1 [r72730]LargeRegexTest.Generated()
01.26% 1.09E+06 native clrjit.dll
01.23% 1.07E+06 native coreclr.dll
00.91% 7.9E+05 Tier-1 [System.Private.CoreLib]System.ReadOnlySpan`1[System.Char].get_Item(int32)
00.89% 7.7E+05 Tier-1 [System.Text.RegularExpressions]Match.AddMatch(int32,int32,int32)
00.88% 7.6E+05 Tier-1 [System.Private.CoreLib]System.ReadOnlySpan`1[System.Char].Slice(int32)
00.85% 7.4E+05 Tier-1 [System.Private.CoreLib]System.ReadOnlySpan`1[System.Char].get_Length()
00.76% 6.6E+05 Tier-1 [r72730]<RegexGenerator_g>F7__GetAsmInstructionsRegex_0+RunnerFactory+Runner.Scan(value class System.ReadOnlySpan`1<wchar>)
00.69% 6E+05 Tier-1 [System.Text.RegularExpressions]Regex.RunSingleMatch(value class System.Text.RegularExpressions.RegexRunnerMode,int32,class System.String,int32,int32,int32)
00.44% 3.8E+05 Tier-1 [System.Text.RegularExpressions]RegexRunner.Capture(int32,int32,int32)
00.38% 3.3E+05 Tier-1 [System.Text.RegularExpressions]RegexRunner.InitializeForScan(class System.Text.RegularExpressions.Regex,value class System.ReadOnlySpan`1<wchar>,int32,value class System.Text.RegularExpressions.RegexRunnerMode)
00.31% 2.7E+05 Tier-1 [r72730]<RegexGenerator_g>F7__GetAsmInstructionsRegex_0+RunnerFactory+Runner.TryFindNextPossibleStartingPosition(value class System.ReadOnlySpan`1<wchar>)
00.30% 2.6E+05 Tier-1 [System.Text.RegularExpressions]Regex.IsMatch(class System.String)
00.29% 2.5E+05 Tier-1 [System.Private.CoreLib]String.op_Implicit(class System.String)
00.25% 2.2E+05 Tier-1 [System.Text.RegularExpressions]Match.Reset(class System.Text.RegularExpressions.Regex,class System.String,int32,int32,int32)
00.20% 1.7E+05 Tier-1 [System.Private.CoreLib]SpanHelpers.SequenceEqual(unsigned int8&,unsigned int8&,unsigned int)
00.18% 1.6E+05 Tier-1 [System.Private.CoreLib]MemoryExtensions.StartsWith(value class System.ReadOnlySpan`1<!!0>,value class System.ReadOnlySpan`1<!!0>)
00.12% 1E+05 native ntoskrnl.exe
00.10% 9E+04 Tier-1 [System.Text.RegularExpressions]RegexRunner.InitializeTimeout(value class System.TimeSpan)
00.08% 7E+04 native ntdll.dll
Benchmark: found 15 intervals; mean interval 570.348ms
I also find myself wishing there was simpler/smoother integration with linux perf
@AndyAyersMS few days ago I've merged https://github.com/dotnet/BenchmarkDotNet/pull/2117 which adds a perf diagnoser that uses perfcollect internally. perfcollect supports collecting hardware counters:
https://github.com/dotnet/BenchmarkDotNet/blob/a78c2e6a6e3db79069fb5bbbd6da6e5cbea8c029/src/BenchmarkDotNet/Templates/perfcollect#L1891
we could take advantage of that and build something on top of it. I won't have the time to do that myself in the near future, but I would be happy to chat and perhaps create an up-for-grabs issue with a very detailed description of what we need and how it could be implemented.
It would also be nice to have an integrated diagnoser for ETW that is benchmark interval aware.
For that we could definitely extend ETWProfiler to always export such a file when hardware counters are enabled. We are already parsing the trace file:
https://github.com/dotnet/BenchmarkDotNet/blob/a78c2e6a6e3db79069fb5bbbd6da6e5cbea8c029/src/BenchmarkDotNet.Diagnostics.Windows/EtwProfiler.cs#L81
in theory it should be a matter of implementing an exporter:
https://github.com/dotnet/BenchmarkDotNet/blob/a78c2e6a6e3db79069fb5bbbd6da6e5cbea8c029/src/BenchmarkDotNet.Diagnostics.Windows/EtwProfiler.cs#L48
Filter on benchmark parameters
This is now built in: #2132
@AndyAyersMS few days ago I've merged #2117 which adds a perf diagnoser that uses
perfcollectinternally
Somehow I missed seeing this -- will have to try it out soon! Thanks!
@AndyAyersMS in case you are interested in more details: https://adamsitnik.com/PerfCollectProfiler/