feat(notify): make mute stage honor send_resolved
Make mute stage honor send_resolved in receivers. This fixes cases where an alert was already sent to a receiver before being silenced.
Sending resolved notifications for a now silenced alert helps update the receiver's status.
For example an opened PagerDuty incident will be closed.
Related #226
Nice. I think the only real issue is that this is a reasonably impactful behavior change. So we should probably put it behind a flag as to not surprise users.
Thanks for doing all this work. If possible, I think we should explore making this behavior a first class citizen of silences instead of having silences inherit send_resolved from downstream integrations.
The main argument for this is that it keeps silences predictable.
When we inherit behavior from downstream integrations, the the same silence now behaves differently in different aggregation groups, as its behaviour changes based on which integration is used.
This is likely going to be really really confusing for operators as some silenced alerts will now send resolved notifications while others will not, or even the same silenced alert sends a resolved notification in one aggregation group but not another.
By making this a property of the silence instead we can make it an explicit choice of the silence author, and it becomes auditable via the history of expired silences.
What do you think?
Sounds good to me, let me try and add this.
@Spaceman1701 Yup, PagerDuty incidents is exactly the reason I raised this issue in the past.
@Spaceman1701 Yup, PagerDuty incidents is exactly the reason I raised this issue in the past.
@SuperQ 👍 this is why my preference is for alertmanager's contract to be "For integrations with send_resolved: true, If we sent a notification, we will send a resolved notification"
@Spaceman1701 Yup, PagerDuty incidents is exactly the reason I raised this issue in the past.
@SuperQ 👍 this is why my preference is for alertmanager's contract to be "For integrations with
send_resolved: true, If we sent a notification, we will send a resolved notification"
I think if that's a goal it may be worth a wider discussion in an issue as we also would need to address resolved notifications for these additional cases:
-
Muted by an inhibition rule. Example: you have a warning alert that fired first, then a critical alert. The critical alert inhibits the warning. Both resolved at the same time, but only the critical sends a resolved notification.
-
Muted by an active/mute time window. The alert fires outside the window, and then resolves within the window.
-
Alertmanager crashes/restarts just before a long
group_intervalflushes (i.e. multi hour group intervals).
I can see that being a wider engineering effort that yourself and @siavashs could lead if you would like to!