hwloc icon indicating copy to clipboard operation
hwloc copied to clipboard

Incorrect CPU kinds on AMD Threadripper PRO 7000

Open mkuron opened this issue 1 year ago • 18 comments

What version of hwloc are you using?

2.10.0

Which operating system and hardware are you running on?

Alma Linux 8.10 Linux 4.18.0-553.5.1.el8_10.x86_64 Dell Precision 7875 Tower BIOS version 1.6.2 AMD Ryzen Threadripper PRO 7975WX 32-Cores

Details of the problem

lstopo shows multiple CPU kinds on AMD Ryzen Threadripper PRO 7975WX due to variations in the max frequency (which looks excessively high) and lack of a base frequency (base frequencies are in general do not seem to be reported by hwloc for AMD CPUs). The AMD Ryzen Threadripper "Storm Peak"/Zen 4 generation is a homogeneous CPU and should have all cores represented as the same kind.

$ lstopo --cpukinds
CPU kind #0 efficiency 0 cpuset 0x0000ffff,0x0000ffff
  FrequencyMaxMHz = 5352
CPU kind #1 efficiency 1 cpuset 0x00800000,0x00800000
  FrequencyMaxMHz = 5517
CPU kind #2 efficiency 2 cpuset 0x00400000,0x00400000
  FrequencyMaxMHz = 5677
CPU kind #3 efficiency 3 cpuset 0x00010000,0x00010000
  FrequencyMaxMHz = 5837
CPU kind #4 efficiency 4 cpuset 0x00040000,0x00040000
  FrequencyMaxMHz = 6001
CPU kind #5 efficiency 5 cpuset 0x00080000,0x00080000
  FrequencyMaxMHz = 6161
CPU kind #6 efficiency 6 cpuset 0x00020000,0x00020000
  FrequencyMaxMHz = 6321
CPU kind #7 efficiency 7 cpuset 0x00100000,0x00100000
  FrequencyMaxMHz = 6482
CPU kind #8 efficiency 8 cpuset 0x00200000,0x00200000
  FrequencyMaxMHz = 6646
CPU kind #9 efficiency 9 cpuset 0x40000000,0x40000000
  FrequencyMaxMHz = 6806
CPU kind #10 efficiency 10 cpuset 0x20000000,0x20000000
  FrequencyMaxMHz = 6966
CPU kind #11 efficiency 11 cpuset 0x80000000,0x80000000
  FrequencyMaxMHz = 7130
CPU kind #12 efficiency 12 cpuset 0x01000000,0x01000000
  FrequencyMaxMHz = 7290
CPU kind #13 efficiency 13 cpuset 0x10000000,0x10000000
  FrequencyMaxMHz = 7451
CPU kind #14 efficiency 14 cpuset 0x04000000,0x04000000
  FrequencyMaxMHz = 7611
CPU kind #15 efficiency 15 cpuset 0x0a000000,0x0a000000
  FrequencyMaxMHz = 7775

This issue bears some similarity to #634, though there the frequencies had only very minor variations and looked much more reasonable. I am not entirely sure whether this CPU really thinks it has such excessively high and varying frequencies, or if this is simply a bug in the BIOS, firmware, or Linux kernel that leads to incorrect reporting.

Notes

The data sheet for this CPU says that the boost frequency is 5.3 GHz (which actually coincides with CPU kind #0), but I can't imagine 7.7 GHz being achievable with any kind of cooling. https://openbenchmarking.org/s/AMD+Ryzen+Threadripper+PRO+7975WX+32-Cores has the lscpu output for the same machine and theirs even goes up to 8.1 GHz. https://www.phoronix.com/review/hp-z6-g5-a/3 actually stated 9 months ago that:

[...] the 7995WX doesn't clock up to 6.44GHz... That's an AMD P-State Linux driver bug not specific to the HP workstation but other Threadripper 7000 series too. I already reported the issue to AMD and they will be posting Linux driver patches soon for fixing that AMD P-State CPU frequency reporting.

As this bug remains unfixed at least in RHEL8's Linux kernel (didn't verify any others), a workaround for this hardware quirk inside hwloc would be desirable. The frequencies reported in /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq correspond to the ones reported by hwloc, so this is clearly not an hwloc bug, but could potentially be worked around in ways similar to #634/#635.

mkuron avatar Sep 17 '24 09:09 mkuron