benchmark
benchmark copied to clipboard
[BUG] Reducing variance doc incorrectly recommends disabling hyper threading AND pinning CPU frequency scaling?
Describe the bug
The "reducing variance" doc [1] recommends disabling hyper threading AND pinning CPU frequency scaling.
However, I tried doing this 6 different ways and discovered that pinning CPU frequency scaling appears to work incorrectly if hyper threads are disabled.
During my benchmark I monitored the CPU frequency of each enabled CPU and logged it each second. I could then analyze the log and determine that only the benchmarks with hyper threading enabled appeared to properly respect pinning CPU frequency scaling. Presumably this is some kind of kernel bug? Has anybody else come across this? And is there a work around?
So I ran the benchmark 6 times at ~ 300 seconds, and each time slightly changed the hyper thread / CPU frequency scaling config. During the benchmark then each second I logged the MHz of each of the 12 or 24 enabled CPUs running the benchmark.
Run 1: hyper threads off and cpufreq governor set to "powersave":
- CPU 0: 298 MHz samples: 1897=p0 1996=p25 1996=p50 1996=p75 2168=p100 594961=tmhz
- CPU 1: 298 MHz samples: 2953=p0 3493=p25 3493=p50 3493=p75 3494=p100 1034005=tmhz
- CPU 2: 298 MHz samples: 2953=p0 3493=p25 3493=p50 3493=p75 3494=p100 1034007=tmhz
- CPU 3: 298 MHz samples: 1898=p0 1996=p25 1996=p50 1996=p75 2168=p100 594931=tmhz
- CPU 4: 298 MHz samples: 1898=p0 1996=p25 1996=p50 1996=p75 2200=p100 595075=tmhz
- CPU 5: 298 MHz samples: 1897=p0 1996=p25 1996=p50 1996=p75 2169=p100 594877=tmhz
- CPU 6: 298 MHz samples: 1898=p0 1996=p25 1996=p50 1996=p75 2200=p100 595101=tmhz
- CPU 7: 298 MHz samples: 2200=p0 3491=p25 3493=p50 3493=p75 3503=p100 1032578=tmhz
- CPU 8: 298 MHz samples: 1897=p0 1996=p25 1996=p50 1996=p75 2190=p100 594934=tmhz
- CPU 9: 298 MHz samples: 2200=p0 3493=p25 3493=p50 3493=p75 3493=p100 1032769=tmhz
- CPU 10: 298 MHz samples: 2200=p0 3492=p25 3493=p50 3493=p75 3508=p100 1012810=tmhz
- CPU 11: 298 MHz samples: 1885=p0 1995=p25 1995=p50 1995=p75 2193=p100 594576=tmhz
Run 2: hyper threads off and cpufreq governor set to default "ondemand" with max frequency set to same as min
- CPU 0: 298 MHz samples: 1887=p0 1996=p25 1996=p50 1996=p75 2193=p100 596252=tmhz
- CPU 1: 298 MHz samples: 2668=p0 3436=p25 3493=p50 3493=p75 3493=p100 1020598=tmhz
- CPU 2: 298 MHz samples: 2669=p0 3436=p25 3493=p50 3493=p75 3493=p100 1020475=tmhz
- CPU 3: 298 MHz samples: 2669=p0 3438=p25 3493=p50 3493=p75 3493=p100 1020529=tmhz
- CPU 4: 298 MHz samples: 2665=p0 3435=p25 3493=p50 3493=p75 3493=p100 1020375=tmhz
- CPU 5: 298 MHz samples: 2269=p0 2748=p25 2794=p50 2794=p75 2794=p100 816871=tmhz
- CPU 6: 298 MHz samples: 2668=p0 3434=p25 3493=p50 3493=p75 3493=p100 1020284=tmhz
- CPU 7: 298 MHz samples: 1890=p0 1993=p25 1996=p50 1996=p75 2193=p100 596019=tmhz
- CPU 8: 298 MHz samples: 1890=p0 1996=p25 1996=p50 1996=p75 2193=p100 596070=tmhz
- CPU 9: 298 MHz samples: 1890=p0 1996=p25 1996=p50 1996=p75 2194=p100 596180=tmhz
- CPU 10: 298 MHz samples: 2200=p0 3380=p25 3493=p50 3493=p75 3501=p100 1003906=tmhz
- CPU 11: 298 MHz samples: 2680=p0 3441=p25 3492=p50 3493=p75 3495=p100 1020446=tmhz
Run 3: hyper threads off and cpufreq governor set to "performance"
- CPU 0: 299 MHz samples: 2399=p0 2954=p25 3270=p50 3492=p75 3493=p100 956729=tmhz
- CPU 1: 299 MHz samples: 1993=p0 2931=p25 3232=p50 3459=p75 3493=p100 948039=tmhz
- CPU 2: 299 MHz samples: 2396=p0 2958=p25 3278=p50 3491=p75 3493=p100 957128=tmhz
- CPU 3: 299 MHz samples: 2398=p0 2928=p25 3220=p50 3456=p75 3493=p100 946117=tmhz
- CPU 4: 299 MHz samples: 2396=p0 2931=p25 3224=p50 3458=p75 3493=p100 948164=tmhz
- CPU 5: 299 MHz samples: 2397=p0 2932=p25 3229=p50 3458=p75 3493=p100 948381=tmhz
- CPU 6: 299 MHz samples: 1890=p0 2921=p25 3218=p50 3448=p75 3493=p100 943951=tmhz
- CPU 7: 299 MHz samples: 1862=p0 2931=p25 3246=p50 3462=p75 3493=p100 948668=tmhz
- CPU 8: 299 MHz samples: 2400=p0 2951=p25 3278=p50 3493=p75 3493=p100 957320=tmhz
- CPU 9: 299 MHz samples: 2041=p0 2925=p25 3240=p50 3470=p75 3493=p100 947504=tmhz
- CPU 10: 299 MHz samples: 1889=p0 2344=p25 2716=p50 2911=p75 3493=p100 796872=tmhz
- CPU 11: 299 MHz samples: 1828=p0 1962=p25 1995=p50 2059=p75 2662=p100 601104=tmhz
Run 4: hyper threads on and cpufreq governor set to "powersave":
- CPU 0: 298 MHz samples: 1741=p0 2195=p25 2195=p50 2195=p75 2200=p100 650545=tmhz
- CPU 1: 298 MHz samples: 1745=p0 2195=p25 2195=p50 2195=p75 2200=p100 650462=tmhz
- CPU 2: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2197=p100 650298=tmhz
- CPU 3: 298 MHz samples: 1742=p0 2195=p25 2195=p50 2195=p75 2200=p100 650281=tmhz
- CPU 4: 298 MHz samples: 1721=p0 2195=p25 2195=p50 2195=p75 2197=p100 650153=tmhz
- CPU 5: 298 MHz samples: 1740=p0 2195=p25 2195=p50 2195=p75 2196=p100 650266=tmhz
- CPU 6: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2196=p100 650285=tmhz
- CPU 7: 298 MHz samples: 1743=p0 2195=p25 2195=p50 2195=p75 2200=p100 650227=tmhz
- CPU 8: 298 MHz samples: 1743=p0 2195=p25 2195=p50 2195=p75 2200=p100 650099=tmhz
- CPU 9: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2200=p100 650111=tmhz
- CPU 10: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2200=p100 650115=tmhz
- CPU 11: 298 MHz samples: 1745=p0 2195=p25 2195=p50 2195=p75 2199=p100 650025=tmhz
- CPU 12: 298 MHz samples: 1741=p0 2195=p25 2195=p50 2195=p75 2200=p100 650127=tmhz
- CPU 13: 298 MHz samples: 1747=p0 2195=p25 2195=p50 2195=p75 2200=p100 650162=tmhz
- CPU 14: 298 MHz samples: 1746=p0 2195=p25 2195=p50 2195=p75 2196=p100 650177=tmhz
- CPU 15: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2196=p100 650088=tmhz
- CPU 16: 298 MHz samples: 1745=p0 2195=p25 2195=p50 2195=p75 2200=p100 650450=tmhz
- CPU 17: 298 MHz samples: 1746=p0 2195=p25 2195=p50 2195=p75 2195=p100 650452=tmhz
- CPU 18: 298 MHz samples: 1744=p0 2195=p25 2195=p50 2195=p75 2198=p100 650423=tmhz
- CPU 19: 298 MHz samples: 1745=p0 2195=p25 2195=p50 2195=p75 2195=p100 650484=tmhz
- CPU 20: 298 MHz samples: 1756=p0 2195=p25 2195=p50 2195=p75 2200=p100 650393=tmhz
- CPU 21: 298 MHz samples: 1949=p0 2200=p25 2200=p50 2200=p75 2200=p100 654747=tmhz
- CPU 22: 298 MHz samples: 1745=p0 2194=p25 2194=p50 2195=p75 2196=p100 650316=tmhz
- CPU 23: 298 MHz samples: 2200=p0 2200=p25 2200=p50 2200=p75 2200=p100 655600=tmhz
Run 5: hyper threads on and cpufreq governor set to default "ondemand" with max frequency set to same as min
- CPU 0: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100 649668=tmhz
- CPU 1: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100 649542=tmhz
- CPU 2: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2196=p100 649509=tmhz
- CPU 3: 298 MHz samples: 1756=p0 2195=p25 2195=p50 2195=p75 2196=p100 649479=tmhz
- CPU 4: 298 MHz samples: 1756=p0 2195=p25 2195=p50 2195=p75 2200=p100 649514=tmhz
- CPU 5: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100 649467=tmhz
- CPU 6: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100 649378=tmhz
- CPU 7: 298 MHz samples: 1755=p0 2195=p25 2195=p50 2195=p75 2200=p100 649501=tmhz
- CPU 8: 298 MHz samples: 1754=p0 2195=p25 2195=p50 2195=p75 2200=p100 649393=tmhz
- CPU 9: 298 MHz samples: 1754=p0 2195=p25 2195=p50 2195=p75 2200=p100 649397=tmhz
- CPU 10: 298 MHz samples: 1759=p0 2195=p25 2195=p50 2195=p75 2195=p100 649485=tmhz
- CPU 11: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100 649488=tmhz
- CPU 12: 298 MHz samples: 1758=p0 2195=p25 2195=p50 2195=p75 2200=p100 649312=tmhz
- CPU 13: 298 MHz samples: 1756=p0 2195=p25 2195=p50 2195=p75 2200=p100 649444=tmhz
- CPU 14: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100 649390=tmhz
- CPU 15: 298 MHz samples: 1751=p0 2195=p25 2195=p50 2195=p75 2200=p100 649347=tmhz
- CPU 16: 298 MHz samples: 1757=p0 2195=p25 2195=p50 2195=p75 2200=p100 649640=tmhz
- CPU 17: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2200=p100 649491=tmhz
- CPU 18: 298 MHz samples: 1757=p0 2195=p25 2195=p50 2195=p75 2200=p100 649627=tmhz
- CPU 19: 298 MHz samples: 1753=p0 2195=p25 2195=p50 2195=p75 2197=p100 649535=tmhz
- CPU 20: 298 MHz samples: 1719=p0 2195=p25 2195=p50 2195=p75 2203=p100 648801=tmhz
- CPU 21: 298 MHz samples: 1878=p0 2200=p25 2200=p50 2200=p75 2200=p100 653780=tmhz
- CPU 22: 298 MHz samples: 1749=p0 2194=p25 2195=p50 2195=p75 2196=p100 649486=tmhz
- CPU 23: 298 MHz samples: 2200=p0 2200=p25 2200=p50 2200=p75 2200=p100 655600=tmhz
Run 6: hyper threads on and cpufreq governor set to "performance"
- CPU 0: 298 MHz samples: 1652=p0 2390=p25 2883=p50 3352=p75 3493=p100 850426=tmhz
- CPU 1: 298 MHz samples: 1652=p0 2380=p25 2888=p50 3369=p75 3493=p100 849895=tmhz
- CPU 2: 298 MHz samples: 1652=p0 2376=p25 2775=p50 3083=p75 3493=p100 815483=tmhz
- CPU 3: 298 MHz samples: 1652=p0 2377=p25 2789=p50 3073=p75 3493=p100 815044=tmhz
- CPU 4: 298 MHz samples: 1686=p0 2364=p25 2791=p50 3052=p75 3493=p100 815013=tmhz
- CPU 5: 298 MHz samples: 1652=p0 2379=p25 2790=p50 3053=p75 3493=p100 816213=tmhz
- CPU 6: 298 MHz samples: 1652=p0 2374=p25 2769=p50 3053=p75 3493=p100 813056=tmhz
- CPU 7: 298 MHz samples: 1653=p0 2383=p25 2764=p50 3045=p75 3493=p100 812936=tmhz
- CPU 8: 298 MHz samples: 1652=p0 2376=p25 2762=p50 3013=p75 3493=p100 809194=tmhz
- CPU 9: 298 MHz samples: 1652=p0 2374=p25 2756=p50 3019=p75 3493=p100 808762=tmhz
- CPU 10: 298 MHz samples: 1653=p0 2377=p25 2768=p50 3072=p75 3493=p100 814275=tmhz
- CPU 11: 298 MHz samples: 1653=p0 2384=p25 2762=p50 3062=p75 3493=p100 814070=tmhz
- CPU 12: 298 MHz samples: 1652=p0 2379=p25 2776=p50 3080=p75 3493=p100 815921=tmhz
- CPU 13: 298 MHz samples: 1653=p0 2374=p25 2769=p50 3076=p75 3493=p100 816261=tmhz
- CPU 14: 298 MHz samples: 1652=p0 2377=p25 2780=p50 3066=p75 3493=p100 813424=tmhz
- CPU 15: 298 MHz samples: 1653=p0 2372=p25 2774=p50 3069=p75 3493=p100 813671=tmhz
- CPU 16: 298 MHz samples: 1627=p0 2367=p25 2761=p50 2953=p75 3493=p100 803208=tmhz
- CPU 17: 298 MHz samples: 1627=p0 2376=p25 2764=p50 2953=p75 3493=p100 803740=tmhz
- CPU 18: 298 MHz samples: 1627=p0 2381=p25 2789=p50 2988=p75 3493=p100 809744=tmhz
- CPU 19: 298 MHz samples: 1627=p0 2383=p25 2780=p50 2995=p75 3493=p100 809672=tmhz
- CPU 20: 298 MHz samples: 1627=p0 2006=p25 2200=p50 2550=p75 3500=p100 688864=tmhz
- CPU 21: 298 MHz samples: 1864=p0 2200=p25 2200=p50 2200=p75 2754=p100 657122=tmhz
- CPU 22: 298 MHz samples: 1627=p0 1915=p25 1984=p50 2040=p75 2426=p100 589621=tmhz
- CPU 23: 298 MHz samples: 2200=p0 2200=p25 2200=p50 2200=p75 2200=p100 655600=tmhz
Note: The p0 thru p100 is the percentile MHz from the 298 samples taken each second during the benchmarks.
Note: tmhz
is the total of all MHz samples. So we would expect the totals to be the same if there is less variance, or?
Why set the governor to "ondemand" with the max frequency set to same as min? Is that not similar to "powersave" governor? It seems not, from the results.
System
Ubuntu 22.04 LTS
To reproduce
This is not actually using the benchmark repo project, but rather just testing out the recommondations to reduce variance at [1].
Expected behavior
I would expect all the first 3 benchmarks to be the most accurate because hyper threads are disabled.
I would expected all benchmarks to have a similar total MHz for each CPU used in the benchmark.
However, in reality only benchmarks 4 and 5 appear to offer reduced variance RE CPU MHz.
It would be great if others could comment on there experiences with benchmarks and CPU frequency scaling. Does anybody else actively monitor the CPU frequencies to sanity that the governor is doing its job? Maybe the "reducing variance" doc [1] could be updated to warn not to take the governor for granted?
Screenshots
n/a
Additional context
The CPU in this case is an AMD Ryzen. Note: Boost mode was also disabled.
[1] https://github.com/google/benchmark/blob/main/docs/reducing_variance.md