likwid
likwid copied to clipboard
likwid-power gives wrong package power consumption when AMD Epyc is idle
We have two clusters with 7301 and 7551 CPUs in dual-socket servers. One system uses air cooling, another one uses liquid cooling. Both system runs under SLES12 SP3.
On both system we observe that in the idle mode likwid-power gives the package power consumption for 2 sockets that is larger than the server power consumption reported via IPMI.
The example output for an idle dual-socket node:
ipmitool dcmi power reading ; ./likwid-powermeter
Instantaneous power reading: 102 Watts
Minimum during sampling period: 24 Watts
Maximum during sampling period: 368 Watts
Average power reading over sample period: 84 Watts
IPMI timestamp: Fri Sep 20 05:12:06 2019
Sampling period: 00000005 Seconds.
Power reading state is: activated
--------------------------------------------------------------------------------
CPU name: AMD EPYC 7301 16-Core Processor
CPU type: AMD K17 (Zen) architecture
CPU clock: 2.20 GHz
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Runtime: 2.00072 s
Measure for socket 0 on CPU 0
Domain CORE:
Energy consumed: 0.00215149 Joules
Power consumed: 0.00107536 Watt
Domain PKG:
Energy consumed: 135.443 Joules
Power consumed: 67.6972 Watt
Measure for socket 1 on CPU 16
Domain CORE:
Energy consumed: 0.000488281 Joules
Power consumed: 0.000244053 Watt
Domain PKG:
Energy consumed: 132.72 Joules
Power consumed: 66.336 Watt
--------------------------------------------------------------------------------
IPMI: 102 Watt likwid-power: 67.6972+66.336=134.0332 Watt
When CPU is under load the reported values seem to be correct, at least P(IPMI) > P(likwid-power).
Please give an advise how to correct this wrong behaviour?
LIKWID gets the so-called energy unit (scale for register value to Joules) from a register (0xC0010299). Afterwards it reads the status register (0xC001029B) before and after and scales the difference by the energy unit. So, if the hardware returns too high values in idle state, LIKWID cannot do anything about it.
Have you tried other tools that can read the energy counters (RAPL) on AMD Epyc? On our AMD EPYC 7451 the Linux kernel 4.15 (Ubuntu 18.04.3) does not offer energy readings through perf or the powercap module. Moreover, our system does not support IPMI readings.
Thank you for the feedback! We observe the same strange reading with a recent version of turbostat 19.03.20 (I suppose it uses the same status registers as LIKWID). Yesterday my question was unscreened at the AMD forum (https://community.amd.com/thread/243717). Hope to get response from them!
Thanks for reporting back. I'll follow the post in the AMD forum. Thanks for taking turbostat and not LIKWID ;) Of course, you can use LIKWID as well if they want further information.
edit: I clicked the "I have the same question" button