bpf: Investigate the *best* value for wakeup_data_size
What would you like to be added?
This constant: https://github.com/sustainable-computing-io/kepler/blob/main/bpf/kepler.bpf.c#L70
Declares how often we wake up to read the ringbuf.
The current math was as follows:
- My system (on average) processes around 600-700 context switches per second
- The sample period in Kepler is once every 3 seconds
- We need to read at least one batch of ringbuf events within that 3 second interval
So 1000 should have me read every 1.7ish seconds 😄
Why is this needed?
When kepler wakes up to read events it consumes CPU. Right now that's showing us as being somewhere between 1-3% mean CPU usage over time. We should consider whether there is a better formula we could use to compute this magic number of 1000.
It could relate to the sample rate.
e.g 500 * SampleRate and perhaps even the 500 could come from something better than an educated guess.
The Kepler CPU usage under normal and stress workloads need to be investigated in parallel. The latest stress test results point to a divergence that needs to be fixed.
Test results posted on the original PR https://github.com/sustainable-computing-io/kepler/pull/1628
The current Kepler CPU usage is now 20% without running load.
How, and on what machine, can I reproduce this result?
@dave-tucker load the kepler latest image and keep it running for a day.
Test results posted on the original PR #1628
Responded: https://github.com/sustainable-computing-io/kepler/pull/1628#issuecomment-2269058775