alertmanager icon indicating copy to clipboard operation
alertmanager copied to clipboard

Pagerduty Integration (events API V2): Updates same alert instead of creating new one

Open mindhash opened this issue 3 years ago • 5 comments
trafficstars

What did you do? Setup Alert manager integration with Pagerduty. The configuration is setup to group by ['alertname']. I also have event orchestration setup in PD to create incident against the alert.

What did you expect to see? Each alert with different name in alert manager should result in a separate alert (+incident) in pager duty.

What did you see instead? Under which circumstances? For some reason, pagerduty considers every new alert as an update to existing alert and performs the update.

image In this image, each of the updates are in fact separate alerts (different alert name). As you can see in the image below. image

After reviewing the code (NotifyV2), I think the Dedupkey is being generated from route key, which may be the reason for issue. I only have one route with receiver setup.

The de-dup key should have been group labels.fingerprint to allow Pagerduty to identify same group updates.

&pagerDutyMessage{
		Client:      tmpl(n.conf.Client),
		ClientURL:   tmpl(n.conf.ClientURL),
		RoutingKey:  tmpl(string(n.conf.RoutingKey)),
		EventAction: eventType,
		**DedupKey:    key.Hash(),**

Environment AlertManager, Pagerduty

  • System information:

    insert output of uname -srm here Darwin 19.6.0 x86_64

  • Alertmanager version:

    insert output of alertmanager --version here (repeat for each alertmanager version in your cluster, if relevant to the issue) 0.23.0

  • Prometheus version:

    insert output of prometheus --version here (repeat for each prometheus version in your cluster, if relevant to the issue)

  • Alertmanager configuration file:


global:
  resolve_timeout: 5m 
  http_config:
      follow_redirects: true
  smtp_from: [email protected]
  smtp_hello: localhost
  smtp_smarthost: localhost:25
  smtp_require_tls: true
  pagerduty_url: https://events.pagerduty.com/v2/enqueue
  opsgenie_api_url: https://api.opsgenie.com/
  wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
  victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
    receiver: default-receiver
    continue: false
    routes:
        - receiver: p1
        continue: true
        group_wait: 30s
        group_interval: 30s
        repeat_interval: 4h
    receivers:
        - name: default-receiver
        email_configs:
            - send_resolved: false
            to: [email protected]
            from: [email protected]
            hello: localhost
            smarthost: localhost:25
            html: '{{ template \"email.default.html\" . }}'
            require_tls: true
        - name: p1
        pagerduty_configs:
            - send_resolved: true
            http_config:
                follow_redirects: true
            routing_key: <secret>
            url: https://events.pagerduty.com/v2/enqueue
            client: SigNoz Alert Manager
            client_url: http://localhost:8080/alerts
            description: "description"
            severity: '{{ (index .Alerts 0).Labels.severity }}'
            component: test2111}}
                
templates: []
  • Prometheus configuration file:
insert configuration here (if relevant to the issue)
  • Logs:
insert Prometheus and Alertmanager logs relevant to the issue here

mindhash avatar Apr 01 '22 13:04 mindhash

Just following up. Can I submit a PR for switching DeDup Key to GroupLabels.Fingerprint? instead of sending rule key

mindhash avatar May 02 '22 10:05 mindhash

@mindhash also curious about this, but not familiar with the relevant AlertManager source code.

Is the whole alert group being sent to PagerDuty as one event or is each alert being sent individually?

aantn avatar Sep 06 '22 20:09 aantn

We're currently working around this in Robusta. We receive alerts from AlertManager and forward them to PagerDuty with a de-dupe key based on the fingerprint.

https://docs.robusta.dev/master/catalog/sinks/PagerDuty.html

So I think the fingerprint-based solution is solid. It's been working for us and would be good to get fixed in AlertManager itself.

aantn avatar Nov 03 '22 08:11 aantn

I don't see a group_by: [alertname] line in your Alertmanager configuration. Am I missing something?

simonpasquier avatar Nov 15 '22 15:11 simonpasquier