BenchmarkDotNet icon indicating copy to clipboard operation
BenchmarkDotNet copied to clipboard

Unable to read some hardware counters on ZEN2 CPU

Open alexcovington opened this issue 4 years ago • 18 comments

Having trouble reading 'CacheMisses' and 'InstructionsRetired' whenever I run the dotnet/performance benchmark suite. If I run the command from within the src\benchmarks\micro directory:

dotnet run -c Release -f netcoreapp5.0 --filter '*Bilinear*' --counters CacheMisses InstructionsRetired

I get the following error message:

// Validating benchmarks:
The counter CacheMisses is not available. Please make sure you are Windows 8+ without Hyper-V
The counter InstructionRetired is not available. Please make sure you are Windows 8+ without Hyper-V

I am able to get some counters to read by enabling IBS in my BIOS (technically wasn't called IBS, I had to disable SVM to get it to work). So the following command does work for me:

dotnet run -c Release -f netcoreapp5.0 --filter '*Bilinear*' --counters BranchMispredictions

If I profile a C# application outside of BenchmarkDotNet using AMD uProf, I can get cache miss and instructions retired statistics. I am also able to read all of the counters on my Skylake machine using the BenchmarkDotNet CLI without any issues.

Would appreciate any help that can be provided!

alexcovington avatar Aug 21 '20 17:08 alexcovington

Hi @alexcovington

BenchmarkDotNet is using TraceEvent which internally uses ETW to get hardware counters information. I am afraid that there is an AMD-specific bug somewhere.

Could you please run the following command and share your output here?

tracelog.exe -profilesources Help

tracelog.exe might not be present in your $PATH so you can use Visual Studio Command Prompt to run this command:

obraz

adamsitnik avatar Oct 21 '20 13:10 adamsitnik

Hi @adamsitnik, sorry for the wait on this. With COVID, I don't have physical access to this machine all the time, so I wasn't able to work on this till now.

Here's the output from tracelog running in admin Developer Command Prompt for VS 2019:

C:\Windows\System32>tracelog -profilesources Help
Id  Name                        Interval  Min      Max
--------------------------------------------------------------
  0 Timer                          10000  1221    1000000
  2 TotalIssues                    65536  4096 2147483647
  6 BranchInstructions             65536  4096 2147483647
  8 DcacheMisses                   65536  4096 2147483647
  9 IcacheMisses                   65536  4096 2147483647
 11 BranchMispredictions           65536  4096 2147483647
 13 FpInstructions                 65536  4096 2147483647
 20 IcacheIssues                   65536  4096 2147483647
 21 DcacheAccesses                 65536  4096 2147483647
 25 FPDispatchedFPUOps             65536  4096 2147483647
 26 FPDispatchedFPUOpsAddExcludeJunk      65536  4096 2147483647
 27 FPDispatchedFPUOpsMulExcludeJunk      65536  4096 2147483647
 28 FPDispatchedFPUOpsStoreExcludeJunk      65536  4096 2147483647
 29 FPDispatchedFPUOpsAddJunk      65536  4096 2147483647
 30 FPDispatchedFPUOpsMulJunk      65536  4096 2147483647
 31 FPDispatchedFPUOpsStoreJunk      65536  4096 2147483647
 32 FPCyclesNoFPUOpsRetired        65536  4096 2147483647
 33 FPDispathedFPUOpsWithFastFlag      65536  4096 2147483647
 34 LSSegmentRegisterLoad          65536  4096 2147483647
 35 LSSegmentRegisterLoadES        65536  4096 2147483647
 36 LSSegmentRegisterLoadCS        65536  4096 2147483647
 37 LSSegmentRegisterLoadSS        65536  4096 2147483647
 38 LSSegmentRegisterLoadDS        65536  4096 2147483647
 39 LSSegmentRegisterLoadFS        65536  4096 2147483647
 40 LSSegmentRegisterLoadGS        65536  4096 2147483647
 41 LSSegmentRegisterLoadHS        65536  4096 2147483647
 42 LSResyncBySelfModifyingCode      65536  4096 2147483647
 43 LSResyncBySnoop                65536  4096 2147483647
 44 LSBuffer2Full                  65536  4096 2147483647
 45 LSLockedOperation              65536  4096 2147483647
 46 LSLateCancelOperation          65536  4096 2147483647
 47 LSRetiredCFLUSH                65536  4096 2147483647
 48 LSRetiredCPUID                 65536  4096 2147483647
 49 DCAccess                       65536  4096 2147483647
 50 DCMiss                         65536  4096 2147483647
 51 DCRefillFromL2                 65536  4096 2147483647
 52 DCRefillFromL2Invalid          65536  4096 2147483647
 53 DCRefillFromL2Shared           65536  4096 2147483647
 54 DCRefillFromL2Exclusive        65536  4096 2147483647
 55 DCRefillFromL2Owner            65536  4096 2147483647
 56 DCRefillFromL2Modified         65536  4096 2147483647
 57 DCRefillFromSystem             65536  4096 2147483647
 58 DCRefillFromSystemInvalid      65536  4096 2147483647
 59 DCRefillFromSystemShared       65536  4096 2147483647
 60 DCRefillFromSystemExclusive      65536  4096 2147483647
 61 DCRefillFromSystemOwner        65536  4096 2147483647
 62 DCRefillFromSystemModified      65536  4096 2147483647
 63 DCRefillCopyBack               65536  4096 2147483647
 64 DCRefillCopyBackInvalid        65536  4096 2147483647
 65 DCRefillCopyBackShared         65536  4096 2147483647
 66 DCRefillCopyBackExclusive      65536  4096 2147483647
 67 DCRefillCopyBackOwner          65536  4096 2147483647
 68 DCRefillCopyBackModified       65536  4096 2147483647
 69 DCL1DTLBMissL2DTLBHit          65536  4096 2147483647
 70 DCL1DTLBMissL2DTLBMiss         65536  4096 2147483647
 71 DCMisalignedDataReference      65536  4096 2147483647
 72 DCLateCancelOfAnAccess         65536  4096 2147483647
 73 DCEarlyCancelOfAnAccess        65536  4096 2147483647
 74 DCOneBitECCError               65536  4096 2147483647
 75 DCOneBitECCErrorScrubberError      65536  4096 2147483647
 76 DCOneBitECCErrorPiggybackScrubberError      65536  4096 2147483647
 77 DCDispatchedPrefetchInstructions      65536  4096 2147483647
 78 DCDispatchedPrefetchInstructionsLoad      65536  4096 2147483647
 79 DCDispatchedPrefetchInstructionsStore      65536  4096 2147483647
 80 DCDispatchedPrefetchInstructionsNTA      65536  4096 2147483647
190 BUCleanToDirty                 65536  4096 2147483647
191 BUSharedToDirty                65536  4096 2147483647
 81 BUInternalL2Request            65536  4096 2147483647
 82 BUInternalL2RequestICFill      65536  4096 2147483647
 83 BUInternalL2RequestDCFill      65536  4096 2147483647
 84 BUInternalL2RequestTLBReload      65536  4096 2147483647
 85 BUInternalL2RequestTagSnoopRequest      65536  4096 2147483647
 86 BUInternalL2RequestCancelledRequest      65536  4096 2147483647
 87 BUFillRequestMissedInL2        65536  4096 2147483647
 88 BUFillRequestMissedInL2ICFill      65536  4096 2147483647
 89 BUFillRequestMissedInL2DCFill      65536  4096 2147483647
 90 BUFillRequestMissedInL2TLBLoad      65536  4096 2147483647
 91 BUFillIntoL2                   65536  4096 2147483647
 92 BUFillIntoL2DirtyL2Victim      65536  4096 2147483647
 93 BUFillIntoL2VictimFromL1       65536  4096 2147483647
 94 ICFetch                        65536  4096 2147483647
 95 ICMiss                         65536  4096 2147483647
 96 ICRefillFromL2                 65536  4096 2147483647
 97 ICRefillFromSystem             65536  4096 2147483647
 98 ICL1TLBMissL2TLBHit            65536  4096 2147483647
 99 ICL1TLBMissL2TLBMiss           65536  4096 2147483647
100 ICResyncBySnoop                65536  4096 2147483647
101 ICInstructionFetchStall        65536  4096 2147483647
102 ICReturnStackHit               65536  4096 2147483647
103 ICReturnStackOverflow          65536  4096 2147483647
104 FRRetiredx86Instructions       65536  4096 2147483647
105 FRRetireduops                  65536  4096 2147483647
106 FRRetiredBranches              65536  4096 2147483647
107 FRRetiredBranchesMispredicted      65536  4096 2147483647
108 FRRetiredTakenBranches         65536  4096 2147483647
109 FRRetiredTakenBranchesMispredicted      65536  4096 2147483647
110 FRRetiredFarControlTransfers      65536  4096 2147483647
111 FRRetiredResyncsNonControlTransferBranches      65536  4096 2147483647
112 FRRetiredNearReturns           65536  4096 2147483647
113 FRRetiredNearReturnsMispredicted      65536  4096 2147483647
114 FRRetiredTakenBranchesMispredictedByAddressMiscompare      65536  4096 2147483647
115 FRRetiredFPUInstructions       65536  4096 2147483647
116 FRRetiredFPUInstructionsx87      65536  4096 2147483647
117 FRRetiredFPUInstructionsMMXAnd3DNow      65536  4096 2147483647
118 FRRetiredFPUInstructionsPackedSSEAndSSE2      65536  4096 2147483647
119 FRRetiredFPUInstructionsScalarSSEAndSSE2      65536  4096 2147483647
120 FRRetiredFastpathDoubleOpInstructions      65536  4096 2147483647
121 FRRetiredFastpathDoubleOpInstructionsLowOpInPosition0      65536  4096 2147483647
122 FRRetiredFastpathDoubleOpInstructionsLowOpInPosition1      65536  4096 2147483647
123 FRRetiredFastpathDoubleOpInstructionsLowOpInPosition2      65536  4096 2147483647
124 FRInterruptsMaskedCycles       65536  4096 2147483647
125 FRInterruptsMaskedWhilePendingCycles      65536  4096 2147483647
126 FRTakenHardwareInterrupts      65536  4096 2147483647
127 FRNothingToDispatch            65536  4096 2147483647
128 FRDispatchStalls               65536  4096 2147483647
129 FRDispatchStallsFromBranchAbortToRetire      65536  4096 2147483647
130 FRDispatchStallsForSerialization      65536  4096 2147483647
131 FRDispatchStallsForSegmentLoad      65536  4096 2147483647
132 FRDispatchStallsWhenReorderBufferFull      65536  4096 2147483647
133 FRDispatchStallsWhenReservationStationsFull      65536  4096 2147483647
134 FRDispatchStallsWhenFPUFull      65536  4096 2147483647
135 FRDispatchStallsWhenLSFull      65536  4096 2147483647
136 FRDispatchStallsWhenWaitingForAllQuiet      65536  4096 2147483647
137 FRDispatchStallsWhenFarControlOrResyncBranchPending      65536  4096 2147483647
138 FRFPUExceptions                65536  4096 2147483647
139 FRFPUExceptionsx87ReclassMicroFaults      65536  4096 2147483647
140 FRFPUExceptionsSSERetypeMicroFaults      65536  4096 2147483647
141 FRFPUExceptionsSSEReclassMicroFaults      65536  4096 2147483647
142 FRFPUExceptionsSSEAndx87MicroTraps      65536  4096 2147483647
143 FRNumberOfBreakPointsForDR0      65536  4096 2147483647
144 FRNumberOfBreakPointsForDR1      65536  4096 2147483647
145 FRNumberOfBreakPointsForDR2      65536  4096 2147483647
146 FRNumberOfBreakPointsForDR3      65536  4096 2147483647
147 NBMemoryControllerPageAccessEvent      65536  4096 2147483647
148 NBMemoryControllerPageAccessEventPageHit      65536  4096 2147483647
149 NBMemoryControllerPageAccessEventPageMiss      65536  4096 2147483647
150 NBMemoryControllerPageAccessEventPageConflict      65536  4096 2147483647
151 NBMemoryControllerPageTableOverflow      65536  4096 2147483647
152 NBMemoryControllerDRAMCommandSlotsMissed      65536  4096 2147483647
153 NBMemoryControllerTurnAround      65536  4096 2147483647
154 NBMemoryControllerTurnAroundDIMM      65536  4096 2147483647
155 NBMemoryControllerTurnAroundReadToWrite      65536  4096 2147483647
156 NBMemoryControllerTurnAroundWriteToRead      65536  4096 2147483647
157 NBMemoryControllerBypassCounter      65536  4096 2147483647
158 NBMemoryControllerBypassCounterHighPriority      65536  4096 2147483647
159 NBMemoryControllerBypassCounterLowPriority      65536  4096 2147483647
160 NBMemoryControllerBypassCounterDRAMControllerInterface      65536  4096 2147483647
161 NBMemoryControllerBypassCounterDRAMControllerQueue      65536  4096 2147483647
162 NBSizedCommands                65536  4096 2147483647
163 NBSizedCommandsNonPostWrSzByte      65536  4096 2147483647
164 NBSizedCommandsNonPostWrSzDword      65536  4096 2147483647
165 NBSizedCommandsWrSzByte        65536  4096 2147483647
166 NBSizedCommandsWrSzDword       65536  4096 2147483647
167 NBSizedCommandsRdSzByte        65536  4096 2147483647
168 NBSizedCommandsRdSzDword       65536  4096 2147483647
169 NBSizedCommandsRdModWr         65536  4096 2147483647
170 NBProbeResult                  65536  4096 2147483647
171 NBProbeResultMiss              65536  4096 2147483647
172 NBProbeResultHit               65536  4096 2147483647
173 NBProbeResultHitDirtyWithoutMemoryCancel      65536  4096 2147483647
174 NBProbeResultHitDirtyWithMemoryCancel      65536  4096 2147483647
175 NBHyperTransportBus0Bandwidth      65536  4096 2147483647
176 NBHyperTransportBus0BandwidthCommandSent      65536  4096 2147483647
177 NBHyperTransportBus0BandwidthDataSent      65536  4096 2147483647
178 NBHyperTransportBus0BandwidthBufferReleaseSent      65536  4096 2147483647
179 NBHyperTransportBus0BandwidthNopSent      65536  4096 2147483647
180 NBHyperTransportBus1Bandwidth      65536  4096 2147483647
181 NBHyperTransportBus1BandwidthCommandSent      65536  4096 2147483647
182 NBHyperTransportBus1BandwidthDataSent      65536  4096 2147483647
183 NBHyperTransportBus1BandwidthBufferReleaseSent      65536  4096 2147483647
184 NBHyperTransportBus1BandwidthNopSent      65536  4096 2147483647
185 NBHyperTransportBus2Bandwidth      65536  4096 2147483647
186 NBHyperTransportBus2BandwidthCommandSent      65536  4096 2147483647
187 NBHyperTransportBus2BandwidthDataSent      65536  4096 2147483647
188 NBHyperTransportBus2BandwidthBufferReleaseSent      65536  4096 2147483647
189 NBHyperTransportBus2BandwidthNopSent      65536  4096 2147483647

alexcovington avatar Oct 26 '20 16:10 alexcovington

With a Ryzen 7 1800X, I can report the same issue. It wouldn't recognize the CacheMisses perf counter because it's simply not listed as such; it's separated into DcacheMisses and IcacheMisses, and apparently that's exclusive to some AMD processors.

And having looked at the code itself, the error does not assume that the perf counter is not supported by the CPU itself.

Rekkonnect avatar Feb 12 '21 11:02 Rekkonnect

I've recently ordered a PC with AMD CPU (ThreadRipper :D) and I am supposed to get it before the 31st of March. When I do, I am going to make sure that Hardware Counters works as expected on AMD

adamsitnik avatar Feb 12 '21 11:02 adamsitnik

A bit related to this PR/issue https://github.com/dotnet/BenchmarkDotNet/pull/1438#issuecomment-620573164

Problem seems in Windows, ETW is not correctly reporting events for AMD CPU (reported here and to the feedback hub here)

xoofx avatar Apr 18 '21 19:04 xoofx

After some experiment with ETW directly, I can see that DCRefillFromL2 is generating some numbers, so it could be used for L1 cache misses while DCRefillFromSystem might be for L2 cache misses. I'm not 100% sure of that, trying to figure out how I can stabilize the numbers (fluctuating consistently quite a bit).

xoofx avatar Apr 19 '21 10:04 xoofx

Has there been any progress on this issue? On my Ryzen 5 5600X machine I got the same error.:

The counter CacheMisses is not available. Please make sure you are Windows 8+ without Hyper-V
The counter BranchMispredictions is not available. Please make sure you are Windows 8+ without Hyper-V

dn9090 avatar Feb 05 '22 10:02 dn9090

@alexcovington @Rekkonnect @xoofx @dn9090 do you still experience this problem with the latest version of BenchmarkDotNet?

I believe that this issue should be resolved since BenchmarkDotNet v0.13.2 thanks to #2030. I just checked AMD Ryzen 9 7950X, everything works fine: the hardware counters are properly reported.

AndreyAkinshin avatar Jul 04 '23 17:07 AndreyAkinshin

@AndreyAkinshin Thanks for checking in.

I don't have access to the system I was using when I originally reported the issue, but I was able to successfully read CacheMisses and InstructionRetired on Zen 3 and Zen 4 systems using BDN v0.13.5.

This seems to be resolved now, I'll go ahead and close the issue. Thanks everyone for the fix :).

alexcovington avatar Jul 05 '23 18:07 alexcovington

I haven't checked since then, but if it has been fixed in Windows, it should be fixed for BDN. Thanks!

xoofx avatar Jul 08 '23 13:07 xoofx

I am having trouble getting hardware counters working fully on a AMD Zen 3 5950X on Windows 10 using latest BDN 0.13.12. SVM is disabled in BIOS. Hyper-V is disabled. PerfView lists counters. And tracelog is attached.

Only BranchMispredictions appears to be working. Others report as not available. Is that just how it is or is there something I can do to get these?

//    * The counter CacheMisses is not available. Please make sure you are Windows 8+ without Hyper-V
//    * The counter TotalCycles is not available. Please make sure you are Windows 8+ without Hyper-V
//    * The counter InstructionRetired is not available. Please make sure you are Windows 8+ without Hyper-V
//    * The counter LLCMisses is not available. Please make sure you are Windows 8+ without Hyper-V
//    * The counter BranchInstructionRetired is not available. Please make sure you are Windows 8+ without Hyper-V
//    * The counter BranchMispredictsRetired is not available. Please make sure you are Windows 8+ without Hyper-V

tracelog.txt

nietras avatar Jan 13 '24 09:01 nietras