alertmanager
alertmanager copied to clipboard
Feature request: Adding metadata to incomming alerts
Motivation
Hi, we use AM heavily and do have a lot of teams each managing their own multiple services running in Kubernetes.
To make things easier, we often have some generic alerts across all the services (not up, pod is pending, image pull back off, high error rate). We tend to add a playbook link to the alerts, and it's also handy to have a link to a documentation of the affected service, it's repository etc. Particularly in case of such generic alerts, the default playbook would not have any info for the particular service what the impact is etc.
Problem
Currently, we would have to generate the Go template somehow with handful of if
s or have the alert separately for each of the service.
Suggestion
Allow AM to add metadata to incoming alerts based on static (or even dynamic) configuration. It would simply check if the alert matches the selector and if so, add the configured metadata to it. This metadata would be mainly annotations but possibly also labels? If the evaluation happened before going to the routing tree, it would be possible also to use the additional labels for the routing.
Static
First thing that comes to my mind would be some new type of "rules" like metadata rules, similarly as inhibition rules.
additional_metadata_rule:
- matches:
- foo =~ bar
overwrite_existing: false # How to handle colisions
annotations:
docs: http:/foo.bar/docs
labels: # Allow even adding labels too?
team: bar
Dynamic
Event more interesting could be the possibility to load such metadata rules form some remote catalog of applications/services. Unfortunately, I'm not aware of any standardized form of such thing (possibly the Backstage Service Catalog). But could be interesting for the future development.
Concerns
- How to deal with collisions (as suggested, could be configurable if it should overwrite or not)
- If labels are added to the alert, user could try to search for the alert in the ALERTS metric using those and would fail, could be unclear where those came from.
- Change of the added label during the phase when the alert is still active could lead to some inconsistencies (not sure how Alertmanager manages "grouping" of incoming alerts) would be added/changed.
Alternatives
- Doing this in the Go template of the notification text
- Generating the Go template would be really hard to maintain and read
- Adding the metadata in the alert receiver (PagerDuty, OpsGenie, ...)
- Would need to add this functionality to each one of the target integrations
- In some cases the notification does not go through any other system (Email, Slack, API ...)
- Doing this even sooner in the Prometheus before sending the alert out, same as the alert relabel does.
The canonical way to solve this is to use the external labels in Prometheus.
To use external labels for adding metadata such as link to the application documentation I'd need to have a separate instance of prometheus for each app. That does not sound viable since we have over hundret of microservices :/
I also have same request, it is relabel feature at alertmanager level.
I also have same request, it is relabel feature at alertmanager level.