alertmanager icon indicating copy to clipboard operation
alertmanager copied to clipboard

Feature Request: Allow send_resolved to be disabled per alert for Pagerduty Integration

Open robertlich opened this issue 7 months ago • 1 comments

I think it would be nice if it were possible to add a label like disable_resolve or something along those lines to an alert so that it will not auto-resolve even if passed to a pagerduty receiver with that feature enabled. It would be pretty simple to document and appears to be a feature a couple people need. The mild inconvenience we encounter is duplicating our receiver config for every team's receiver to have an auto-resolve and not auto-resolve pagerduty receiver.

There is a similar request with some discussion here: https://github.com/prometheus/alertmanager/issues/4033

I don't think the send_resolved field needs to be templateable, I think it makes sense as static per receiver to reduce complexity. And I'm not sure every integration needs this feature so it would be nice to scope it to the pagerduty integration.

There is some discussion in the above thread about the strength of alertmanager being around grouping. I would argue that while that is true, alerts are mostly grouped on traits of how the alerting service functions (the teams supporting it, the related pieces of infrastructure, other things that are technically connected to it) and breaking that grouping into two to change how the alert behaves with an external integration seems unnecessary. As opposed to routing by destination (different receivers for different teams, etc,.) makes sense to me to control at the routing step.

I'm not very familiar with the codebase so I don't know if this is a technically complex request, if it is totally feature breaking to implement this for just the pagerduty integration I apologize. This is also pretty small fries.

robertlich avatar May 22 '25 17:05 robertlich

+1 for this feature request, would be very useful

JordanMolone avatar May 22 '25 17:05 JordanMolone

Hey, just want to quickly check if I've got this right.

So right now, send_resolved is an all-or-nothing setting for a PagerDuty receiver. But you want to stop some critical alerts from auto-resolving without having to create a whole separate receiver for them, which is a pain.

The proposal is to use a label on the alert itself, like disable_resolve: true, to tell Alertmanager "don't auto-resolve this one," even if the receiver is set to do so per alert.

Am I getting it right?

pehlicd avatar Jul 25 '25 09:07 pehlicd

Yes - thank you for the succinct summary!

robertlich avatar Jul 25 '25 15:07 robertlich

I can see how it would solve the immediate problem of duplicating receiver configurations.

However, I'm slightly concerned about the architectural implications on a couple of points:

  1. Design Philosophy: Does allowing an alert's label to override a receiver's static configuration blur the lines between routing logic and alert metadata? Alertmanager's strength is its predictable routing and notification pipeline.

  2. Complexity Creep: Could this feature open the door to more requests for per-alert overrides for other receiver settings, making configurations harder to debug and reason about in the long run?

I worry this might be trading a clear (though sometimes verbose) configuration for a more implicit and potentially fragile one.

pehlicd avatar Jul 28 '25 12:07 pehlicd

+1 for this feature request despite your worries; it would be very useful

haininghu avatar Aug 05 '25 18:08 haininghu

Design Philosophy: Does allowing an alert's label to override a receiver's static configuration blur the lines between routing logic and alert metadata? Alertmanager's strength is its predictable routing and notification pipeline.

We can control the severity of the alert in pagerduty (and I think similar in opsgenie) via templating strings to pull from the alert labels. It feels like something we are already doing when I frame it in terms of modifying the 3rd party alert behavior via labels. That's why I posted the issue is I was surprised it wasn't configurable based on what I had already seen. I think this doesn't affect the routing logic exactly, no destination would change based on this, only how the individual alerts are resolved from alertmanager.

Complexity Creep: Could this feature open the door to more requests for per-alert overrides for other receiver settings, making configurations harder to debug and reason about in the long run?

Probably, I see the very real concern here. Looking across the other receivers I see many implement their own send_resolved with some having different defaults. I think you're right not to want to open a per-alert override waterfall. But it seems like such a common use-case to cover and not necessarily something to make a new receiver for. If there were a way to add this feature uniformly across all the 3rd party receivers that would be the only way to implement it.

I worry this might be trading a clear (though sometimes verbose) configuration for a more implicit and potentially fragile one.

I think I would make the trade in this scenario, but you guys have the experience. Having already considered this via templating in the other issue and now here, I'm gonna consider the horse dead and close the issue. Thanks for your time!

robertlich avatar Aug 05 '25 20:08 robertlich