alertmanager icon indicating copy to clipboard operation
alertmanager copied to clipboard

Include receiver name in notification metrics

Open sinkingpoint opened this issue 3 years ago • 7 comments

Currently Alertmanager has metrics for various details around the behavior of receivers

https://github.com/prometheus/alertmanager/blob/f30aef2c6990b9e8fd481f94bb5bd6293ddf589f/notify/notify.go#L256-L281

These are broken down by an integration label, which indicates the type of receiver (pagerduty, webhook etc).

The docs recommend extending Alertmanager with the webhook receiver (https://prometheus.io/docs/operating/integrations/#alertmanager-webhook-receiver) as opposed to adding new integrations (which I understand, it limits the maintenance burden of the main repo), but this means that a lot of receivers become the webhook integration.

Because of the limited resolution of the receiver metrics, this means that it's impossible to tease different webhooks apart. For example, we have various webhooks that go to JIRA, Chat, and other internal systems. By just having the one metric, if one of those webhook receivers falls over, it becomes impossible from just the metrics to tell which one it was.

I would like to propose a couple of solutions for this:

a) include the receiver name as a secondary label to the above metrics b) include the hostname/port of the webhook either as another label, or replacing/extending the existing integration label

In order to allow more accurate alerting of receiver issues

sinkingpoint avatar Jul 19 '22 22:07 sinkingpoint

Also: I recognise that this would be a breaking change should we decide to go down this path - not sure how to handle it (or even if we should, considering alertmanager is still technically 0.x)

sinkingpoint avatar Jul 19 '22 22:07 sinkingpoint

I would also be keen on this and happy to contribute a PR.

felipesere avatar Jul 21 '22 09:07 felipesere

@felipesere I have a patch at https://github.com/sinkingpoint/alertmanager/tree/sinkingpoint/receiver_name_metrics that I'm testing internally and then will PR if there's interest

sinkingpoint avatar Jul 21 '22 10:07 sinkingpoint

How has that patch been working for you? 😄

felipesere avatar Aug 14 '22 09:08 felipesere

It's been working well for us. I'll PR it today

sinkingpoint avatar Aug 22 '22 01:08 sinkingpoint

@sinkingpoint I see the MR is still open, are you still happy to see that through to completion?

alimehrabikoshki avatar Jun 08 '23 17:06 alimehrabikoshki

@alimehrabikoshki I've just updated the PR

sinkingpoint avatar Jun 22 '23 07:06 sinkingpoint