Show dispatched notifications in web UI
For auditing purposes it would be nice to have an API or log file to monitor when alerts are triggered, and for a setup and debugging perspective it would be very helpful if there was a view in the web UI which would show a live stream of notification actions as they happen.
Agreed that both would be neat. Since we likely have to add some data retention, for auditing a simple log is probably best too. A consistent history can then be built from those by another tool.
A log of some form sounds best to me for all the reporting use cases.
Being able to go back the last few notifications in the UI for debugging would be handy too.
A few questions on this
- Would there be a flag to turn this on / off when starting alertmanager (e.g.
-notification-log.enabled- What would it default to?
- Would the directory the file is written to be configurable for the user (like
-storage.local.pathis in prometheus)?- What would it default to?
- Would the log just show when a notification was dispatched for an alert or would it also show when the alert closed (I feel like it could also be useful for auditing purposes to know how long the alert was firing for)?
- Any thoughts / concerns about the size of the log file? (e.g. putting a date in the file name or implementing some form of log rotation).
Red Hat has a strong interest in this (including allocation engineer time), specifically in being able to get all recent notifications within a certain timestamp range with full labels and annotations over an API.
Would the existing mesh-replicated notification log be a good place to add this extra detail (currently it only stores the grouping key and firing/resolved status), or would it have to be a separate datastore?
@fabxc @stuartnelson3
I'm wary of adding such a broad function, we don't want arbitrary amounts of data stored in the Alertmanager. This information should be available for debugging, but for any non-trivial amount of time it should probably hang off a webhook or logging mechanism rather than being part of the AM.
@brian-brazil Ok, just checked. The requirement would be for ~weeks of data, or at least days. Probably a job for the webhook then?
Yeah, I'm thinking hours might be acceptable within the AM.