notifications-engine
notifications-engine copied to clipboard
PagerDuty Service Integration - Incident Creation API
Problem Statement:
It appears the pagerduty.go service implementation recently used a rate-limited synchronous API intended for human initiated incidents.
https://github.com/argoproj/notifications-engine/blob/7b9b5d3281e1b52c17b04369de1cda8072d15072/pkg/services/pagerduty.go#L98
Root Cause Analysis
The Notification Engine utilizes CreateIncidentWithContext from the PagerDuty Go SDK v1.5.0. Upon inspection of the implementation of CreateIncidentWithContext
, it can be determined that the API Path /incident
corresponds to PagerDuty's Incident Creation API. This has concerning implications when utilizing the API in correspondence with Machine Events due to the following exerpts from the provided API documentation:
This API is not to be used for connecting your monitoring tools to send events to PagerDuty; for that, use the Events v2 ... Unlike the Events APIs, the Incident Creation API is heavily rate limited on a per-account basis. It’s meant for the creation of events at "human speed" - in response to user action, rather than automated tooling. ... This API is also synchronous, as it returns a fully-formed Incident object (rather than the incident key returned by the asynchronous Events APIs). The Events APIs accept arbitrary JSON objects, while the Incident creation API currently only supports string Incident bodies. ... The Incident Creation API is analogous to clicking on the "Create Incident" button in PagerDuty’s web or mobile interfaces. It captures incident information from users and uses that to create a new incident, regardless of monitoring tool data. With this API, one can connect to a PagerDuty account and create or edit incidents on that account. This API is not to be used for connecting your monitoring tools to send events to PagerDuty; for that, use the Events v2
Git Blame:
The changeset introducing this code was authored by @RaviHari and merged by @pasha-codefresh . Additional discussion is necessary to identify whether there is consensus from the original author and approver on the problem statement and recommended solution.
Recommendation
Add
- Replace removed PagerDuty Incident Creation API support with the Asynchronous API recommendations from PagerDuty v2 Events API.
The Events API v2 is a highly reliable, highly available asynchronous API that ingests machine events from monitoring tools and other systems like code repositories, observability platforms, automated workflow tools, and configuration management systems.
Remove
- Remove support for PagerDuty Incident Creation API for subscriptions to machine generated events such as Rollout Aborted.
Alternative & Additional Solutioning
- Copy or modify Incident API into a ArgoProj repository that offers a User Interface (UI). Write UI logic to support a human initiated event that calls
CreateIncidentWithContext
as intended by the API. - Expand adoption of PagerDuty v2 Events API to support the Asynchronous Change Events API, in addition to the Alert Events API
hey @josephmcasey , good catch! Do you want to contribute this fix or do you prefer somebody else take it? I think suggestion with move to another API is good one
@pasha-codefresh , I can definitely add this functionality if you don't mind guiding the contribution:
- Are there any contribution documents for this repository that I could reference?
- Any preference on what is done with the Incident Creation API?
Argocd notification is part of argocd today, so you need to read this https://argo-cd.readthedocs.io/en/latest/developer-guide/code-contributions/
and just replace argo notification in go mod with your local folder
In my opinion use PagerDuty v2 Events API is good option for solve this problem
Apologies for the delay. I've had a doozy of a time trying to setup ArgoCD on Unbuntu 21.04 and the Gitpods. I put in a PR for fixing a bug with the Gitpod Dockerfile. Next I will try to setup the virtualized environment on the latest OSX.
I ran into the same issue, and didn't see any movement on a PR so I went ahead and took a shot at refactoring the integration to use the v2 Events API instead: https://github.com/argoproj/notifications-engine/compare/master...EricTendian:notifications-engine:master
Not quite yet ready for a PR (need to do some more testing) but wanted to update folks here. Right now this is a breaking change (since it will not be compatible with the old PagerDuty integration configuration) so if that's a problem, we can discuss how backwards compatibility should work.
Not sure what i am missing here but I am using the latest helm chart 5.36.3, but still getting
Failed to process: service type 'pagerdutyv2' is not supported
Using below configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
namespace: argocd
data:
# pagerduty tokens
service.pagerdutyv2: |
serviceKeys:
service1: $pagerduty_service1
# defines templates to send notifications
template.app-sync-status-template: |
message: "Application {{.app.metadata.name}} sync status is {{.app.status.sync.status}}"
pagerdutyv2:
summary: "Application {{.app.metadata.name}} sync status is {{.app.status.sync.status}}"
severity: "high"
source: "Argocd Management Dev | {{.app.metadata.name}}"
url: "{{.context.argocdUrl}}/applications/argocd/{{.app.metadata.name}}"
template.app-health-status-template: |
message: "Application {{.app.metadata.name}} health status is {{.app.status.health.status}}"
pagerdutyv2:
summary: "Application {{.app.metadata.name}} health status is {{.app.status.health.status}}"
severity: "high"
source: "Argocd Management Dev | {{.app.metadata.name}}"
url: "{{.context.argocdUrl}}/applications/argocd/{{.app.metadata.name}}"
template.app-operation-status-template: |
message: "Application {{.app.metadata.name}} operation status is {{.app.status.operation.status}}"
pagerdutyv2:
summary: "Application {{.app.metadata.name}} operation status is {{.app.status.operation.status}}"
severity: "high"
source: "Argocd Management Dev | {{.app.metadata.name}}"
url: "{{.context.argocdUrl}}/applications/argocd/{{.app.metadata.name}}"
# define the triggers when the templates will be send
trigger.on-application-fails: |
- when: app.status.sync.status in ['Unknown', 'OutOfSync']
send: [app-sync-status-template]
oncePer: app.status.sync.revision
- when: app.status.health.status in ['Suspended', 'Missing', 'Degraded', 'Unknown']
send: [app-health-status-template]
oncePer: app.status.sync.revision
- when: app.status.operationState.phase in ['Error', 'Failed']
send: [app-operation-status-template]
oncePer: app.status.sync.revision
Project Yaml
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: project-1
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
annotations:
notifications.argoproj.io/subscribe.on-application-fails.pagerdutyv2: 'service1'
...
after some debugging found that Argocd 's Go.mod is still pointing to the older version of notification-engine.
I've raised an issue 14127 to use the latest revision.
https://github.com/argoproj/argo-cd/pull/14175/files