scaphandre icon indicating copy to clipboard operation
scaphandre copied to clipboard

Detect and correct overflows of the RAPL microjoule counter

Open TheElectronWill opened this issue 2 years ago • 3 comments

Problem

The RAPL energy counter is incremented and can overflow. Currently, this overflow is not handled.

Currently, the energy measurements are "slightly" (potentially a lot?) wrong. Fixing that might fix other issues where the user complain about "wrong" power usage.

Solution

Instead of ignoring the value, the overflow should be corrected. Quoting @uggla:

If previous_microjoules !=0 then we could probably do microjoule = (u64::MAX - previous_microjoules) + last_microjoules

Alternatives

Additional context

See https://github.com/powercap/powercap/issues/3#issuecomment-636256230

TheElectronWill avatar Mar 22 '23 13:03 TheElectronWill

The LIKWID tool does take the overflows into account. Some info here: https://github.com/RRZE-HPC/likwid/issues/13

TheElectronWill avatar Mar 24 '23 10:03 TheElectronWill

I saw this issue linked from the powercap project. FYI I think your proposed solution won't work correctly for two reasons:

  1. While the MSR is 64 bits, only 32 bits are used for energy values.
  2. Those 32 bits are encoded using status units from the MSR_RAPL_POWER_UNIT register. See Section 15.10 of the Intel Software Developer's Manual, Volume 3, March 2023 edition. The standard configuration encodes using the formulation 1/2^ESU, but some processors are different (particularly some Intel Atom CPUs) as are some domains like DRAM and PSYS on some processors which might have fixed ESU values for those domains that differ from the unit register.

I've found that detecting overflow in RAPL can be a challenge. At a minimum, you need to compute the actual max energy value that the MSR register can report and use that value when accounting for overflow, e.g., as done here [1] (full disclosure: my code). I'm not entirely convinced that this always works as expected though, even if you don't "miss" an overflow---I've seen quirky behavior in the past that resulted in overestimating power consumption. It could be that it's not really guaranteed that the register will achieve it's max logical value before it actually turns over, but this approach is at least logically correct modulo bad register behavior. I haven't conducted a rigorous experiment in a long time though, so I'm not sure how prevalent problems might be.

[1] https://github.com/energymon/energymon/blob/38ef1e6d2d69abf1e3496832369663918d9e56d4/msr/energymon-msr.c#L174-L182

Cheers.

connorimes avatar May 22 '23 17:05 connorimes

That's right, 64 bits is too much for the MSR counter:

image

I haven't seen aberrant values when correcting the overflows just after reading the counter, I'll check that again :)

edit: of course using the MSR directly requires to take into account the "quirks" of some platforms, that's what the linux kernel does for perf and powercap (scaphandre uses powercap on linux, for now). These interface return 64bits values because they perform the unit conversion. I'll have to check the overflows in that case. Thanks for the info!

TheElectronWill avatar May 22 '23 21:05 TheElectronWill