alertmanager
alertmanager copied to clipboard
Pagerduty Integration (events API V2): Updates same alert instead of creating new one
What did you do? Setup Alert manager integration with Pagerduty. The configuration is setup to group by ['alertname']. I also have event orchestration setup in PD to create incident against the alert.
What did you expect to see? Each alert with different name in alert manager should result in a separate alert (+incident) in pager duty.
What did you see instead? Under which circumstances? For some reason, pagerduty considers every new alert as an update to existing alert and performs the update.
In this image, each of the updates are in fact separate alerts (different alert name). As you can see in the image below.

After reviewing the code (NotifyV2), I think the Dedupkey is being generated from route key, which may be the reason for issue. I only have one route with receiver setup.
The de-dup key should have been group labels.fingerprint to allow Pagerduty to identify same group updates.
&pagerDutyMessage{
Client: tmpl(n.conf.Client),
ClientURL: tmpl(n.conf.ClientURL),
RoutingKey: tmpl(string(n.conf.RoutingKey)),
EventAction: eventType,
**DedupKey: key.Hash(),**
Environment AlertManager, Pagerduty
-
System information:
insert output of
uname -srmhere Darwin 19.6.0 x86_64 -
Alertmanager version:
insert output of
alertmanager --versionhere (repeat for each alertmanager version in your cluster, if relevant to the issue) 0.23.0 -
Prometheus version:
insert output of
prometheus --versionhere (repeat for each prometheus version in your cluster, if relevant to the issue) -
Alertmanager configuration file:
global:
resolve_timeout: 5m
http_config:
follow_redirects: true
smtp_from: [email protected]
smtp_hello: localhost
smtp_smarthost: localhost:25
smtp_require_tls: true
pagerduty_url: https://events.pagerduty.com/v2/enqueue
opsgenie_api_url: https://api.opsgenie.com/
wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
receiver: default-receiver
continue: false
routes:
- receiver: p1
continue: true
group_wait: 30s
group_interval: 30s
repeat_interval: 4h
receivers:
- name: default-receiver
email_configs:
- send_resolved: false
to: [email protected]
from: [email protected]
hello: localhost
smarthost: localhost:25
html: '{{ template \"email.default.html\" . }}'
require_tls: true
- name: p1
pagerduty_configs:
- send_resolved: true
http_config:
follow_redirects: true
routing_key: <secret>
url: https://events.pagerduty.com/v2/enqueue
client: SigNoz Alert Manager
client_url: http://localhost:8080/alerts
description: "description"
severity: '{{ (index .Alerts 0).Labels.severity }}'
component: test2111}}
templates: []
- Prometheus configuration file:
insert configuration here (if relevant to the issue)
- Logs:
insert Prometheus and Alertmanager logs relevant to the issue here
Just following up. Can I submit a PR for switching DeDup Key to GroupLabels.Fingerprint? instead of sending rule key
@mindhash also curious about this, but not familiar with the relevant AlertManager source code.
Is the whole alert group being sent to PagerDuty as one event or is each alert being sent individually?
We're currently working around this in Robusta. We receive alerts from AlertManager and forward them to PagerDuty with a de-dupe key based on the fingerprint.
https://docs.robusta.dev/master/catalog/sinks/PagerDuty.html
So I think the fingerprint-based solution is solid. It's been working for us and would be good to get fixed in AlertManager itself.
I don't see a group_by: [alertname] line in your Alertmanager configuration. Am I missing something?