kepler bpf: Investigate the *best* value for wakeup_data

What would you like to be added?

This constant: https://github.com/sustainable-computing-io/kepler/blob/main/bpf/kepler.bpf.c#L70

Declares how often we wake up to read the ringbuf.

The current math was as follows:

My system (on average) processes around 600-700 context switches per second
The sample period in Kepler is once every 3 seconds
We need to read at least one batch of ringbuf events within that 3 second interval

So 1000 should have me read every 1.7ish seconds 😄

Why is this needed?

When kepler wakes up to read events it consumes CPU. Right now that's showing us as being somewhere between 1-3% mean CPU usage over time. We should consider whether there is a better formula we could use to compute this magic number of 1000.

It could relate to the sample rate.

e.g 500 * SampleRate and perhaps even the 500 could come from something better than an educated guess.

Aug 01 '24 14:08 dave-tucker

The Kepler CPU usage under normal and stress workloads need to be investigated in parallel. The latest stress test results point to a divergence that needs to be fixed.

Aug 01 '24 14:08 rootfs

The current Kepler CPU usage is now 20% without running load.

Aug 02 '24 15:08 rootfs

Test results posted on the original PR https://github.com/sustainable-computing-io/kepler/pull/1628

Aug 02 '24 21:08 rootfs

The current Kepler CPU usage is now 20% without running load.

How, and on what machine, can I reproduce this result?

Aug 05 '24 12:08 dave-tucker

@dave-tucker load the kepler latest image and keep it running for a day.

Aug 05 '24 12:08 rootfs

Test results posted on the original PR #1628

Responded: https://github.com/sustainable-computing-io/kepler/pull/1628#issuecomment-2269058775

Aug 05 '24 13:08 dave-tucker

bpf: Investigate the *best* value for wakeup_data_size

What would you like to be added?

Why is this needed?

bpf: Investigate the best value for wakeup_data_size