likwid icon indicating copy to clipboard operation
likwid copied to clipboard

likwid-power gives wrong package power consumption when AMD Epyc is idle

Open vvsteg opened this issue 5 years ago • 3 comments

We have two clusters with 7301 and 7551 CPUs in dual-socket servers. One system uses air cooling, another one uses liquid cooling. Both system runs under SLES12 SP3.

On both system we observe that in the idle mode likwid-power gives the package power consumption for 2 sockets that is larger than the server power consumption reported via IPMI.

The example output for an idle dual-socket node:

ipmitool dcmi power reading ; ./likwid-powermeter 

    Instantaneous power reading:                   102 Watts
    Minimum during sampling period:                 24 Watts
    Maximum during sampling period:                368 Watts
    Average power reading over sample period:       84 Watts
    IPMI timestamp:                           Fri Sep 20 05:12:06 2019
    Sampling period:                          00000005 Seconds.
    Power reading state is:                   activated


--------------------------------------------------------------------------------
CPU name:	AMD EPYC 7301 16-Core Processor
CPU type:	AMD K17 (Zen) architecture
CPU clock:	2.20 GHz
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Runtime: 2.00072 s
Measure for socket 0 on CPU 0
Domain CORE:
Energy consumed: 0.00215149 Joules
Power consumed: 0.00107536 Watt
Domain PKG:
Energy consumed: 135.443 Joules
Power consumed: 67.6972 Watt

Measure for socket 1 on CPU 16
Domain CORE:
Energy consumed: 0.000488281 Joules
Power consumed: 0.000244053 Watt
Domain PKG:
Energy consumed: 132.72 Joules
Power consumed: 66.336 Watt
--------------------------------------------------------------------------------

IPMI: 102 Watt likwid-power: 67.6972+66.336=134.0332 Watt

When CPU is under load the reported values seem to be correct, at least P(IPMI) > P(likwid-power).

Please give an advise how to correct this wrong behaviour?

vvsteg avatar Sep 20 '19 05:09 vvsteg

LIKWID gets the so-called energy unit (scale for register value to Joules) from a register (0xC0010299). Afterwards it reads the status register (0xC001029B) before and after and scales the difference by the energy unit. So, if the hardware returns too high values in idle state, LIKWID cannot do anything about it.

Have you tried other tools that can read the energy counters (RAPL) on AMD Epyc? On our AMD EPYC 7451 the Linux kernel 4.15 (Ubuntu 18.04.3) does not offer energy readings through perf or the powercap module. Moreover, our system does not support IPMI readings.

TomTheBear avatar Sep 24 '19 12:09 TomTheBear

Thank you for the feedback! We observe the same strange reading with a recent version of turbostat 19.03.20 (I suppose it uses the same status registers as LIKWID). Yesterday my question was unscreened at the AMD forum (https://community.amd.com/thread/243717). Hope to get response from them!

vvsteg avatar Oct 08 '19 10:10 vvsteg

Thanks for reporting back. I'll follow the post in the AMD forum. Thanks for taking turbostat and not LIKWID ;) Of course, you can use LIKWID as well if they want further information.

edit: I clicked the "I have the same question" button

TomTheBear avatar Oct 08 '19 12:10 TomTheBear