kepler icon indicating copy to clipboard operation
kepler copied to clipboard

Ephemeral Pod is not always found by Kepler

Open nikimanoledaki opened this issue 3 years ago • 7 comments

Describe the bug A clear and concise description of what the bug is.

To Reproduce I used ARC to create a self-hosted runner for GitHub Actions. It runs for 10-15s and then self-destructs. Follow these steps to re-create this.

Querying Kepler for this Pod does not return any data. Kepler sometimes finds the Pod and sometimes does not (@rootfs can provide more info).

Expected behavior Expected Pod to be consistently picked up by Kepler

Desktop (please complete the following information):

  • OS: [e.g. iOS] RHEL 8

nikimanoledaki avatar Oct 19 '22 19:10 nikimanoledaki

Since you are using GitHub Actions, the environment is probably a VM.... But, can you check the logs to see if the BPF module is running?

marceloamaral avatar Oct 20 '22 02:10 marceloamaral

I reported https://github.com/sustainable-computing-io/kepler/issues/264 before not sure we fixed all those or still some occurance might lead to this issue?

jichenjc avatar Oct 20 '22 06:10 jichenjc

Thanks both, I am using self hosted runners with ARC, where the runners run as Pods in the cluster so they are not VMs in this case. I included a link for how I set them up in case you would like to try the same setup :) When the action is triggered, the tests run for approx 10-15s, so they go very quickly and perhaps that is why Kepler does not detect and return data for them.

Aside from the self-destructing runners that terminate as soon as the job is done, I also have a “manual delete” runner that stays running after the job. Kepler returns data for this Pod without any issue so Kepler works as expected for other Pods, just not for the short-lived ones…

nikimanoledaki avatar Oct 20 '22 06:10 nikimanoledaki

Sorry, just realised I did not add a link to the setup, here it is: https://gist.github.com/nikimanoledaki/545b215c229ebe4ed6ad1d8c189ecbb9

nikimanoledaki avatar Oct 20 '22 06:10 nikimanoledaki

the tests run for approx 10-15s

But how long does the pod stay alive? Kepler collects metrics every 3s, so in theory if pods stay at 10s we should see them

marceloamaral avatar Oct 20 '22 07:10 marceloamaral

Given that each self-destructing Pod is created as soon as the previous self-destructing Pod terminates, and given that I start the test as soon the previous test ends, each Pod runs only for the duration of the test, so 10-15s.

nikimanoledaki avatar Oct 20 '22 08:10 nikimanoledaki

The odd thing is that the same type of Pod is somethings reported and sometimes missing.

rootfs avatar Oct 20 '22 14:10 rootfs

Just a thought, would upgrading bcc library fix this issue? #317

rootfs avatar Oct 21 '22 01:10 rootfs

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar May 17 '23 18:05 stale[bot]

We have done several improvements. Is this still an issue?

marceloamaral avatar May 18 '23 05:05 marceloamaral

Closing this since I do not have access to the environment/project that led to this issue.. If a similar issue occurs in the future, we can refer to this one for more info.

nikimanoledaki avatar Jul 30 '23 12:07 nikimanoledaki