kepler
                                
                                
                                
                                    kepler copied to clipboard
                            
                            
                            
                        Ephemeral Pod is not always found by Kepler
Describe the bug A clear and concise description of what the bug is.
To Reproduce I used ARC to create a self-hosted runner for GitHub Actions. It runs for 10-15s and then self-destructs. Follow these steps to re-create this.
Querying Kepler for this Pod does not return any data. Kepler sometimes finds the Pod and sometimes does not (@rootfs can provide more info).
Expected behavior Expected Pod to be consistently picked up by Kepler
Desktop (please complete the following information):
- OS: [e.g. iOS] RHEL 8
 
Since you are using GitHub Actions, the environment is probably a VM.... But, can you check the logs to see if the BPF module is running?
I reported https://github.com/sustainable-computing-io/kepler/issues/264 before not sure we fixed all those or still some occurance might lead to this issue?
Thanks both, I am using self hosted runners with ARC, where the runners run as Pods in the cluster so they are not VMs in this case. I included a link for how I set them up in case you would like to try the same setup :) When the action is triggered, the tests run for approx 10-15s, so they go very quickly and perhaps that is why Kepler does not detect and return data for them.
Aside from the self-destructing runners that terminate as soon as the job is done, I also have a “manual delete” runner that stays running after the job. Kepler returns data for this Pod without any issue so Kepler works as expected for other Pods, just not for the short-lived ones…
Sorry, just realised I did not add a link to the setup, here it is: https://gist.github.com/nikimanoledaki/545b215c229ebe4ed6ad1d8c189ecbb9
the tests run for approx 10-15s
But how long does the pod stay alive? Kepler collects metrics every 3s, so in theory if pods stay at 10s we should see them
Given that each self-destructing Pod is created as soon as the previous self-destructing Pod terminates, and given that I start the test as soon the previous test ends, each Pod runs only for the duration of the test, so 10-15s.
The odd thing is that the same type of Pod is somethings reported and sometimes missing.
Just a thought, would upgrading bcc library fix this issue? #317
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
We have done several improvements. Is this still an issue?
Closing this since I do not have access to the environment/project that led to this issue.. If a similar issue occurs in the future, we can refer to this one for more info.