allow user to request consistent ordering / sorting of grouped alerts
alertmanager currently groups alerts without a consistent ordering, which is tough to follow for large groups.
The user should be able to specify a sort order on some label or annotation, which would give consistency and locally meaningful sorting. This could be done with a Go sort function in the template or with a alertmanager configuration entry perhaps.
I propose that at a minimum the Alerts in the template structure should come with some ordering on it, like we have done elsewhere.
Are you referring to the /api/v1/alerts/groups endpoint? we're currently sorting by internal identifier in /api/v1/alerts (https://github.com/prometheus/alertmanager/blob/master/api/api.go#L405-L407)
This is about notification templates.
I don't mind writing some code if it will help move this forward. For me it would be enough to have a function or alert manager option to simply sort alerts lexicographically by label or annotation.
I think we should start with a consistent ordering on the Alerts we provide to notification templates. That way users get something okay without having to do extra work.
For a default ordering, how about lexicographical by (alert.annotations.summary, alert.creation_time)? That's the simplest thing I can see that could be generally useful. Or maybe even look for a special sort_key annotation that users can define?
You can't presume that any particular annotation exists, nor that creation times are stable. I'd suggest working entirely off alert labels. This should not be configurable at this level.
OK, what alert labels would make for a sensible default sort key? (job, instance)?
You'll need to use all of them, otherwise it won't be consistent. Moving job and instance to the front is probably wise.
Looking at dispatch/dispatch.go, there seem to be two ways to go: either modify aggrGroup to always have a sorted list of Alert structs instead of a map, or sort the alertsSlice when alerts are flushed. The alertsSlice sort seems easier to implement.
Does that make sense or am I misunderstanding your intention or the code?
Yes, the slice would be the one to sort somewhere along that codepath.
The proposed change is in https://github.com/prometheus/alertmanager/pull/1234 but if Alertmanager will allow arbitrary sorting from the user in the future, then the LabelSet.Before() method should probably be extended to take a list of label names, in which case the change becomes trivial. I didn't propose that API extension because I don't know if it's right, and wanted to keep the scope as small as possible.
@brian-brazil the request was for the user to be able to request a specific sort order for alerts and I thought #1234 was just the first step. Does closing this request mean it won't be done or that it will be implemented elsewhere?
Ah, I'd missed that. I'm personally hoping we can avoid having to implement that.
I'm curious about the way the API returns the result. is there any sorting logic?
I have a suggestion: Couldn't you just order in the sequence of the group by list?
e.g. group_by: ['severity', 'alertname'] leads to first grouping by severity and then by alertnames
group_by: ['alertname','severity'] first groups by alertnames then by severity
We have a use case for this where we're generating metrics per domain, writing alerts to detect various kinds of traffic anomalies, then grouping notifications "per-anomaly" to produce a list of domains with each type of anomaly. This can be dozens of items and we would prefer to get this list of domains sorted when formatting it.
My proposal would be e.g. for our case,
group_sort_key_labels: [domain]
and if none is set, skipping the sort to keep whatever the current order would be. Notably, we don't need dynamic sorting, multiple sorts, or to do any kind of sorting in the template, we just want the alerts slice sorted in a particular order for the template.
If there's no resistance to this approach I would like to prepare a PR for it.