robusta icon indicating copy to clipboard operation
robusta copied to clipboard

Node memory pressure - pod eviction

Open prom3theu5 opened this issue 1 year ago • 2 comments

Hi

Currently evaluating some workloads running on a low tier AKS cluster of 3 b2 machines I've setup robusta minimal without prometheus purely because I need to capture issues with evictions while I evaluate which tier of machine we need for our workloads

Currently a couple of deployments get evicted and rescheduled on another node because of node: low memory However, these aren't pod oom killed issues, its the node memory threshold I guess which is the reason for the eviction

I expected robusta to alera me when this happened, like it does for pod OOM issues, however it doesn't appear to - i had 50 instances of a pod evicted, and no alerts

Is there not a default playbook setup for this? do I have to setup a custom playbook, or is what I want not possible with robusta?

thanks

prom3theu5 avatar Jan 20 '24 21:01 prom3theu5

Hi 👋, thanks for opening an issue! Please note, it may take some time for us to respond, but we'll get back to you as soon as we can!

  • 💬 Slack Community: Join Robusta team and other contributors on Slack here.
  • 📖 Docs: Find our documentation here.
  • 🎥 YouTube Channel: Watch our videos here.

github-actions[bot] avatar Jan 20 '24 21:01 github-actions[bot]

Hey, you'll have to setup a custom playbook for this. It's not captured by default due to the potential for a lot of noise on auto-scaled clusters.

I believe there is a Kubernetes event (kubectl get events) that you can use for the custom playbook trigger - can you confirm?

aantn avatar Feb 24 '24 17:02 aantn

Hi @prom3theu5, we have a pod eviction trigger now. You can use a playbook like below and get alerted on evictions.

customPlaybooks:
- triggers:
  - on_pod_evicted: {}
  actions:
  - create_finding: #
      title: "Pod $name in namespace $namespace was Evicted"
      aggregation_key: "PodEvictedTriggered"

More details here. Please let us know if you have feedback. Closing the issue for now.

pavangudiwada avatar Jun 17 '24 11:06 pavangudiwada