robusta icon indicating copy to clipboard operation
robusta copied to clipboard

oomkill report doesn't work as expected (nsenter: failed to execute dmesg: No such file or directory)

Open donch opened this issue 9 months ago • 4 comments

Describe the bug When a oomkill occurs on our cluster (Based on Talos Linux), we are unable to get the full dmesg stack with the following error :

nsenter: failed to execute dmesg: No such file or directory

Expected behavior We need the full dmesg output on that event

Additional context

We tried to debug the behavior using the toolbook image : robustadev/debug-toolbox. It seens (at least on Talos) that the command used to get the dmesg isn't working (https://github.com/robusta-dev/robusta/blob/147538b96438a08b5849f728458ea916ddde425b/src/robusta/integrations/kubernetes/custom_models.py#L287) When using nsenter -t 1 "{cmd}" it works as expected (at least from a debug-toolbox pods)

donch avatar Feb 13 '25 14:02 donch

Hi 👋, thanks for opening an issue! Please note, it may take some time for us to respond, but we'll get back to you as soon as we can!

  • 💬 Slack Community: Join Robusta team and other contributors on Slack here.
  • 📖 Docs: Find our documentation here.
  • 🎥 YouTube Channel: Watch our videos here.

github-actions[bot] avatar Feb 13 '25 14:02 github-actions[bot]

Hi @donch, we're actually planning to disable the dmesg enrichment by default in the upcoming Robusta release, as there are a few edge cases we're not happy with right now.

To help us prioritize fixing it, can you share more details on what you're looking for in the dmesg output and why you care about it? It does have helpful information on OOMKills, but given the challenges in making it work properly everywhere, we're not sure if it is worth the effort.

aantn avatar Feb 16 '25 10:02 aantn

Hi @aantn , usually, the dmesg output is more a debug trace we can transmit to developper to understand why their application is running out of memory. I think it's a nice to have informations

donch avatar Feb 17 '25 13:02 donch

Got it, thanks. We're looking at ways to let HolmesGPT surface up more of this data. If you're interested in discussing would love to chat and understand if it can work for your use case.

aantn avatar Feb 24 '25 12:02 aantn