BenchmarkDotNet
BenchmarkDotNet copied to clipboard
Add support for using non standard HW counter
Hello BenchmarkDotNet folks!
This PR is for discussion. I should have opened an issue maybe, but as I'm using this fix to actually see the counters in my project, better just open this PR directly, feel free to close it, kill it if you don't like it! 😅
So my issue is that perf counters are very frustrating to work with (nothing to blame for BenchmarkdDotNet, you are not responsible) because depending on your processors, you might not be able to have any relevant in BenchmarkDotNet because the counters exposed are limited and are not matching all the counters that can be exposed by Windows.
For example, on my machine (Windows 10) and CPU (AMD Ryzen 3900X), the HardwareCounter.CacheMisses
is not available. I might be able to use instead DCL1DTLBMissL2DTLBHit
and DCL1DTLBMissL2DTLBMiss
(although I'm not entirely sure myself it is relatively equivalent, anyway) but the API HardwareCounter is based on an enum instead of allowing me to pass a string.
So this PR is changing slightly the API to allow to pass custom HW counter names like this:
[HardwareCounters("TotalIssues/Insts", "DCL1DTLBMissL2DTLBHit/L1Miss-L2Hit", "DCL1DTLBMissL2DTLBMiss/L1Miss-L2Miss")]
public void Managed() {
// ... code to benchmark
}
The /
is allowing to have a display name.
It will display the results like this for the config above:
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated | L1Miss-L2Hit/Op | Insts/Op | L1Miss-L2Miss/Op |
|-------- |---------:|----------:|----------:|------:|------:|------:|----------:|----------------:|----------:|-----------------:|
| Managed | 3.159 ms | 0.2923 ms | 0.0452 ms | - | - | - | - | 1,237 | 3,758,266 | 8,190 |
// * Hints *
Outliers
BenchRunWorldProcessor.Managed: IterationCount=5, LaunchCount=1, WarmupCount=3 -> 1 outlier was removed (3.47 ms)
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Gen 0 : GC Generation 0 collects per 1000 operations
Gen 1 : GC Generation 1 collects per 1000 operations
Gen 2 : GC Generation 2 collects per 1000 operations
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
L1Miss-L2Hit/Op : Hardware counter 'DCL1DTLBMissL2DTLBHit' per single operation
Insts/Op : Hardware counter 'TotalIssues' per single operation
L1Miss-L2Miss/Op : Hardware counter 'DCL1DTLBMissL2DTLBMiss' per single operation
1 ms : 1 Millisecond (0.001 sec)
I tried to keep the API similar so you can still use HardwareCounter.TotalIssues
and others, but as per attribute constraints, we have to use strings instead of structs (previously an enum)
Side note, I have been running my own small benchmark with HW counters and I'm seeing very different result in the counters printed by BenchmarkDotNet, and also from the one running with AMDuprof.
Sorry, I don't know how but I killed your comment @filipnavara . so responding here:
Yeah, we could have a HardwareCounterInfo
on the side (that would be the HardwareCounter
of this PR) and keep the old HardwareCounter
enum for the attributes.
Sorry, I don't know how but I killed your comment @filipnavara . so responding here:
Yeah, we could have a
HardwareCounterInfo
on the side (that would be theHardwareCounter
of this PR) and keep the oldHardwareCounter
enum for the attributes.
Sorry, I killed the comment myself. I meant to press "Start Review" instead of "Add Comment" and by the time I reached the changed definition of HardwareCounter
I realized why you did it. It may still make sense to preserve backwards compatibility but I am not sure how much this attribute was used in practice.
So I have slightly modified the PR to keep the previous enum HardwareCounter
and to introduce HardwareCounterInfo
. HardwareCounterInfo
is used in all the places where HardwareCounter
was previously used, except that you can still use the enum HardwareCounter directly with HardwareCounterAttribute
But unfortunately, it seems that I can't get cache misses perf counters for AMD Ryzen with Windows perf counters... 😞
DcacheAccesses
is working, but not DcacheMisses
or DCMiss
(they don't log anything on my machine) while I can see them with AMDuprof
Side note, I have been running my own small benchmark with HW counters and I'm seeing very different result in the counters printed by BenchmarkDotNet, and also from the one running with AMDuprof.
Note that the counters in AMDuprof seem to be divided by a constant 4 compare to the one I'm getting with Windows perf counters. In BenchmarkDotNet, I haven't checked why they are off and by how much factor.
Copying what I wrote on Twitter ...
I confirmed with the Windows team there is a bug here somewhere. They're asking if you can use "Feedback Hub" to file this bug. AMD support exists but some of the event selects are outdated or incorrect, and your report will help prioritize which ones to get to fixed first.
Closing as the problem is in Windows (e.g for the cache misses)
Reopening as we might still require to be able to collect specific counters. The predefined ones are not able to collect special counters that could be relevant for a specific CPU/machine.
Looks mostly good to me. Just a couple comments, and needs merge conflicts resolved.
Oh, it's quite a old PR, will have to dig into it because I completely forgot the details. 😅 I'll try to have a look next week.