awesome-prometheus-alerts Alert KubernetesPodNotHealthy reporting incorrect alerts

Alert KubernetesPodNotHealthy reporting incorrect alerts

Open mastaab opened this issue 4 years ago • 5 comments

The way the following alert works is (from my understanding), that is any Pod that is "Pending|Unknown|Failed" state for longer than the default resolution in the last hour will trigger the alert. At least that's how the alert is firing for me. The Alert description says something else, the pod should be down for longer than an hour.

  - alert: KubernetesPodNotHealthy
    expr: min_over_time(sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[1h:]) > 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Kubernetes Pod not healthy (instance {{ $labels.instance }})
      description: Pod has been in a non-ready state for longer than an hour.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}

I'm no expert on PromQL but maybe the range/resolution has to be changed like this: [1h:1h]?

Apr 20 '21 08:04 mastaab

Ok, this is weird.

I'll write a new query using [1h:1m].

Thanks for your feedback @mastaab!

May 01 '21 18:05 samber

I don't think this is firing right now... Basically it works now such that if the pod is down/pending/whatever for more than 1 minute it fires... Should it be [15m:1m] and >= 15?

Jun 04 '21 18:06 snowzach

Not a PromQL expert at all, but what about the following?:

- alert: KubernetesPodNotHealthy
    expr: kube_pod_status_phase{phase=~"Pending|Unknown|Failed"} > 0
    for: 1h
    labels:
      severity: critical
    annotations:
      summary: Kubernetes Pod not healthy (instance {{ $labels.instance }})
      description: Pod has been in a non-ready state for longer than an hour.\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}

Jun 23 '21 19:06 liorfranko

Reminder: this was not yet changed on the main website. And the query truely doesn't do what it intends to do. A few seconds of unavailability suffice to fire that alert.

Oct 14 '21 08:10 benedikt-haug

I have no Kube cluster running on my side. Can you write a PR @gna582 with a better query please?

Nov 01 '21 09:11 samber

awesome-prometheus-alerts awesome-prometheus-alerts copied to clipboard

Alert KubernetesPodNotHealthy reporting incorrect alerts

awesome-prometheus-alerts
awesome-prometheus-alerts copied to clipboard