alertmanager
alertmanager copied to clipboard
Alert resolving not working as expected
What did you do? I am trying to send a "resolved" message for alerts that have been resolved unsusccessfully. It does send them some times but most of the times it doesn't.. I have adjusted the resolve_timeout, group_wait, group_interval and repeat_interval many times but nothing seems to fix the problem.
What did you expect to see? I am expecting to get a "resolved" message at most resolved_timeout time after the alert has been resolved.
Environment Running AlertManager with Prometheus Operator 0.52.1 (this has also happened on 0.40.0) with Alertmanager version v0.23.0. AlertManager is connected to a ThanosRuler with Thanos version v0.21.0 and it sends alerts to a webhook.
- Alertmanager configuration file:
global:
resolve_timeout: 3m
route:
receiver: default
group_by:
- alertname
- namespace
- instance
- pod
- statefulset
- deployment
- job_name
- persistentvolumeclaim
group_wait: 30s
group_interval: 2m
repeat_interval: 1h
routes:
- receiver: my-webhook
match:
alert: true
receivers:
- name: default
- name: my-webhook
webhook_configs:
- send_resolved: true
url: <my-webhook-url>
max_alerts: 1
Can anyone help with this?
@edenkoveshi What do you mean "Resolved unsuccessfully"?
I am expecting to get a "resolved" message at most resolved_timeout time after the alert has been resolved.
resolve_timeout
will come into play only if the sender didn't provide an end data for the alert. It isn't the case with Prometheus and Thanos Ruler since both set the end date to "eval time + 5m".