Include receiver name in notification metrics
Currently Alertmanager has metrics for various details around the behavior of receivers
https://github.com/prometheus/alertmanager/blob/f30aef2c6990b9e8fd481f94bb5bd6293ddf589f/notify/notify.go#L256-L281
These are broken down by an integration label, which indicates the type of receiver (pagerduty, webhook etc).
The docs recommend extending Alertmanager with the webhook receiver (https://prometheus.io/docs/operating/integrations/#alertmanager-webhook-receiver) as opposed to adding new integrations (which I understand, it limits the maintenance burden of the main repo), but this means that a lot of receivers become the webhook integration.
Because of the limited resolution of the receiver metrics, this means that it's impossible to tease different webhooks apart. For example, we have various webhooks that go to JIRA, Chat, and other internal systems. By just having the one metric, if one of those webhook receivers falls over, it becomes impossible from just the metrics to tell which one it was.
I would like to propose a couple of solutions for this:
a) include the receiver name as a secondary label to the above metrics b) include the hostname/port of the webhook either as another label, or replacing/extending the existing integration label
In order to allow more accurate alerting of receiver issues
Also: I recognise that this would be a breaking change should we decide to go down this path - not sure how to handle it (or even if we should, considering alertmanager is still technically 0.x)
I would also be keen on this and happy to contribute a PR.
@felipesere I have a patch at https://github.com/sinkingpoint/alertmanager/tree/sinkingpoint/receiver_name_metrics that I'm testing internally and then will PR if there's interest
How has that patch been working for you? 😄
It's been working well for us. I'll PR it today
@sinkingpoint I see the MR is still open, are you still happy to see that through to completion?
@alimehrabikoshki I've just updated the PR