alertmanager icon indicating copy to clipboard operation
alertmanager copied to clipboard

Show Related Expired Silences on Alerts page

Open comicmuse opened this issue 7 years ago • 4 comments

Show silences that would have affected a firing alert were they not expired. Perhaps in the Info box to avoid cluttering the main screen.

comicmuse avatar Aug 03 '18 12:08 comicmuse

@comicmuse could you add some more details on the particular use-case of this feature?

mxinden avatar Aug 04 '18 15:08 mxinden

I can chime in on this as we implement something similar on an internal dashboard that pulls in data via the AlertManager API.

On our internal dashboard we have a small badge next to an alert name that indicates that this alarm was previously silenced but is now back in alarm after the silence expired.

Hovering over the badge will give the most applicable Silence message, and clicking the badge will take you to the expired Silence.

It's pretty helpful to see if an issue is taking longer to resolve than we originally thought and/or to track down who originally silenced the alarm and why. I think the only "gotcha" is when there are multiple expired silences that would match for the alarm - we just display the first match we found.

poblahblahblah avatar Aug 04 '18 15:08 poblahblahblah

Yeah. @poblahblahblah has pretty much covered it from a usability point of view. As a specific example - if we are running out of allocatable IP Addresses and need to procure an additional subnet, this goes into a bureaucratic black hole of a request process. So we take the alert, raise a tracking ticket and silence it for a period (let's say a week). When it appears again, it would be helpful to have an indication other than the start time that this is an alert that has been silenced before, so that we don't have to re-investigate, we just need to check that things are progressing with the request., and then extend the silence.

No need to track all silences ever - just the ones that are currently displayed on the Expired Silences page should cover the use-case.

comicmuse avatar Aug 04 '18 20:08 comicmuse

I'll add a similar / related but distinct use case. You have an alarm which is flapping, in other words an alarm is triggered, then immediately gets automatically resolved. You are trying to create a Silence to prevent it from alarming again. It would help to be able to select a resolved alarm to match a new Silence. I add this here because it might affect how you need to think about both Expired Silences and Resolved Alarms.

solongtony avatar Aug 14 '20 13:08 solongtony