kepler Ephemeral Pod is not always found by Kepler

Describe the bug A clear and concise description of what the bug is.

To Reproduce I used ARC to create a self-hosted runner for GitHub Actions. It runs for 10-15s and then self-destructs. Follow these steps to re-create this.

Querying Kepler for this Pod does not return any data. Kepler sometimes finds the Pod and sometimes does not (@rootfs can provide more info).

Expected behavior Expected Pod to be consistently picked up by Kepler

Desktop (please complete the following information):

OS: [e.g. iOS] RHEL 8

Oct 19 '22 19:10 nikimanoledaki

Since you are using GitHub Actions, the environment is probably a VM.... But, can you check the logs to see if the BPF module is running?

Oct 20 '22 02:10 marceloamaral

I reported https://github.com/sustainable-computing-io/kepler/issues/264 before not sure we fixed all those or still some occurance might lead to this issue?

Oct 20 '22 06:10 jichenjc

Thanks both, I am using self hosted runners with ARC, where the runners run as Pods in the cluster so they are not VMs in this case. I included a link for how I set them up in case you would like to try the same setup :) When the action is triggered, the tests run for approx 10-15s, so they go very quickly and perhaps that is why Kepler does not detect and return data for them.

Aside from the self-destructing runners that terminate as soon as the job is done, I also have a “manual delete” runner that stays running after the job. Kepler returns data for this Pod without any issue so Kepler works as expected for other Pods, just not for the short-lived ones…

Oct 20 '22 06:10 nikimanoledaki

Sorry, just realised I did not add a link to the setup, here it is: https://gist.github.com/nikimanoledaki/545b215c229ebe4ed6ad1d8c189ecbb9

Oct 20 '22 06:10 nikimanoledaki

the tests run for approx 10-15s

But how long does the pod stay alive? Kepler collects metrics every 3s, so in theory if pods stay at 10s we should see them

Oct 20 '22 07:10 marceloamaral

Given that each self-destructing Pod is created as soon as the previous self-destructing Pod terminates, and given that I start the test as soon the previous test ends, each Pod runs only for the duration of the test, so 10-15s.

Oct 20 '22 08:10 nikimanoledaki

The odd thing is that the same type of Pod is somethings reported and sometimes missing.

Oct 20 '22 14:10 rootfs

Just a thought, would upgrading bcc library fix this issue? #317

Oct 21 '22 01:10 rootfs

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

May 17 '23 18:05 stale[bot]

We have done several improvements. Is this still an issue?

May 18 '23 05:05 marceloamaral

Closing this since I do not have access to the environment/project that led to this issue.. If a similar issue occurs in the future, we can refer to this one for more info.

Jul 30 '23 12:07 nikimanoledaki

kepler kepler copied to clipboard

Ephemeral Pod is not always found by Kepler

kepler
kepler copied to clipboard