robusta
robusta copied to clipboard
Node memory pressure - pod eviction
Hi
Currently evaluating some workloads running on a low tier AKS cluster of 3 b2 machines I've setup robusta minimal without prometheus purely because I need to capture issues with evictions while I evaluate which tier of machine we need for our workloads
Currently a couple of deployments get evicted and rescheduled on another node because of node: low memory However, these aren't pod oom killed issues, its the node memory threshold I guess which is the reason for the eviction
I expected robusta to alera me when this happened, like it does for pod OOM issues, however it doesn't appear to - i had 50 instances of a pod evicted, and no alerts
Is there not a default playbook setup for this? do I have to setup a custom playbook, or is what I want not possible with robusta?
thanks
Hi 👋, thanks for opening an issue! Please note, it may take some time for us to respond, but we'll get back to you as soon as we can!
Hey, you'll have to setup a custom playbook for this. It's not captured by default due to the potential for a lot of noise on auto-scaled clusters.
I believe there is a Kubernetes event (kubectl get events) that you can use for the custom playbook trigger - can you confirm?
Hi @prom3theu5, we have a pod eviction trigger now. You can use a playbook like below and get alerted on evictions.
customPlaybooks:
- triggers:
- on_pod_evicted: {}
actions:
- create_finding: #
title: "Pod $name in namespace $namespace was Evicted"
aggregation_key: "PodEvictedTriggered"
More details here. Please let us know if you have feedback. Closing the issue for now.