alertmanager icon indicating copy to clipboard operation
alertmanager copied to clipboard

group_by: [alertname, alertstate] doesn't work as it should

Open Drugoy opened this issue 5 years ago • 27 comments

What did you do? I used

group_by:
  - alertname
  - alertstate

config and got multiple alerts (both firing and resolved) mixed together in a single notification.

What did you expect to see? Separate notifications: one for firing alerts and the other for resolved ones.

What did you see instead? Under which circumstances? A single notification with multiple alerts (both firing and resolved) mixed together.

Environment Yes.

  • System information:

Linux 3.10.0-1127.el7.x86_64 x86_64

  • Alertmanager version:

alertmanager, version 0.21.0 (branch: HEAD, revision: 4c6c03ebfe21009c546e4d1e9b92c371d67c021d) build user: root@dee35927357f build date: 20200617-08:54:02 go version: go1.14.4

  • Prometheus version:

prometheus, version 2.19.2 (branch: HEAD, revision: c448ada63d83002e9c1d2c9f84e09f55a61f0ff7) build user: root@dd72efe1549d build date: 20200626-09:02:20 go version: go1.14.4

  • Alertmanager configuration file:
global:
  resolve_timeout: 5m
  smtp_from: [email protected]
  smtp_smarthost: 10.0.0.2:25
  smtp_require_tls: false

route:
  group_by:
    - alertname
    - alertstate
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 3h
  receiver: topguys

receivers:
  - name: topguys
    email_configs:
      - to: [email protected]
        send_resolved: true

Drugoy avatar Jul 29 '20 12:07 Drugoy

I have came across the same issue,wait for an answer T_T

chaishiqi avatar Aug 04 '20 11:08 chaishiqi

Alertstate is not a label sent to alertmanager. Alertmanager will always send resolved and active alerts in the same notification.

roidelapluie avatar Aug 04 '20 11:08 roidelapluie

Alertstate is not a label sent to alertmanager. Alertmanager will always send resolved and active alerts in the same notification.

Got it. Thanks for your answer.Will it be sperated by reslove and alert in the future?

chaishiqi avatar Aug 04 '20 11:08 chaishiqi

The goal of alertmanager is to send as few notifications as possible; so that seems unlikely.

You can use group_by: [...] to not have any grouping at all if you wish so.

roidelapluie avatar Aug 04 '20 12:08 roidelapluie

That goal needs to be re-evaluated then. I'd see the main goal for alertmanager to send alerts that are maximally useful to the recipients and comfortable to work with. Using no grouping at all is even less comfortable.

Is it due to difficulty of implementing alertstate label or is it more of a political decision not to implement it?

Drugoy avatar Aug 04 '20 12:08 Drugoy

What the alertmanager is doing is sending firing alerts first, then resolved alerts, all grouped.

If we split by alert state, if I receive a email with resolved alerts, I do not know if it is all resolved or only part is resolved and the other notification is lost (because the receiver could not handle both at the same time, or for some reason the notification was rejected).

It is also unclear how that should behave with the different timers we have now.

roidelapluie avatar Aug 04 '20 12:08 roidelapluie

or only part is resolved and the other notification is lost (because the receiver could not handle both at the same time, or for some reason the notification was rejected).

It's not alertmanager's job to think of whether the notification was lost along the way and this problem has nothing to do with splitting alerts by state: it would cause you as much harm (if not more) if your notification with grouped alerts got lost.

if I receive a email with resolved alerts, I do not know if it is all resolved or only part is resolved

That problem is actual now as well: I got a bunch of notifications. Each of them has grouped alerts. How would I know if there are any firing alerts left? (I think I should file a separate ticket with that feature request) I think a good solution would be introduction of a counter of currently still firing alerts so that 'resolved' notifications could look like 5/7 alerts resolved.

Drugoy avatar Aug 05 '20 08:08 Drugoy

I got a bunch of notifications. Each of them has grouped alerts. How would I know if there any firing alerts left?

If the first notification is solved, there are no firing alert left.

You can create the counter in templates.

roidelapluie avatar Aug 24 '20 22:08 roidelapluie

Alertstate is not a label sent to alertmanager. Alertmanager will always send resolved and active alerts in the same notification.

Hold up, but https://prometheus.io/docs/alerting/latest/configuration/#webhook_config says the alert structure is:

{
  "version": "4",
  "groupKey": <string>,              // key identifying the group of alerts (e.g. to deduplicate)
  "truncatedAlerts": <int>,          // how many alerts have been truncated due to "max_alerts"
  "status": "<resolved|firing>",
  "receiver": <string>,
  "groupLabels": <object>,
  "commonLabels": <object>,
  "commonAnnotations": <object>,
  "externalURL": <string>,           // backlink to the Alertmanager.
  "alerts": [
    {
      "status": "<resolved|firing>",
      "labels": <object>,
      "annotations": <object>,
      "startsAt": "<rfc3339>",
      "endsAt": "<rfc3339>",
      "generatorURL": <string>       // identifies the entity that caused the alert
    },
    ...
  ]
}

why not just move status into labels?

Drugoy avatar Oct 08 '20 12:10 Drugoy

Hi, any update on this? Can alertmanager developers leave the decisions on the users? That is what any configuration is there for.

metadataengine avatar Nov 30 '21 08:11 metadataengine

Late to the party, but I'm using this config for our OpenShift alertmanager pipeline:

route:
  group_by:
    - alertname
    - severity
    - status

Hope this helps someone trying to google for this.

OlGe404 avatar Feb 16 '23 12:02 OlGe404

For @OlGe404: I'm not sure I follow what grouping by status gives you in this context. The status itself needs to be sent from Prometheus and this is not something we do.

For everything else on the thread:

While I understand the words behind the use case, I'm not sure I follow what kind of real-world applications exist to get separate firing and resolve notifications.

The goal of the alertmanager is to be as efficient as possible in the way of sending notifications. A part of that is due to how groups work. However, indeed there is currently no way to divide firing and resolve alerts within a group in the case where you'd like to get them separate.

Before we delve into solutions, could you elaborate on what is your use case for the separation to begin with?

gotjosh avatar Feb 16 '23 14:02 gotjosh

I'm not sure if this fits the topic, but my personal use-case is that by not being able to route based on status, means that "resolved" notifications are not sent to routers which have time-based active/mute rules.

Example sequence:

  1. alert fires at 5:59PM and is received by office-hours receiver
  2. routes switch to the out-of-hours receiver
  3. alert resolves at 6:01PM and the resolution is sent to the out-of-hours receiver
  4. the office-hours receiver is never told that the issue is now resolved

I would like to solve this by defining a high-priority route saying that all with alertstate=resolved should go to the office-hours receiver (with continue: true).

While this relates to routing, rather than grouping, I feel like the desire for defining matchers based on the alertstate label has significant overlap with this issue.

wpalmer avatar Apr 14 '23 13:04 wpalmer

@wpalmer I feel like your use case is similar to #226

simonpasquier avatar Jun 02 '23 12:06 simonpasquier

This issues subject is "sketchy" at best but underlying need is quite resonable in high alert turnower scenarios.

If each webhook contains up 10 alerts with mixed states receivers have retain allready handled alert states as within group_interval one webhook can contain allready "handled" firing alerts and new previously "unhandled" resolved alerts

kautkata avatar Feb 11 '25 07:02 kautkata

Before we delve into solutions, could you elaborate on what is your use case for the separation to begin with?

Hello @gotjosh , we have 10 large k8s clusters. We have an alerting rule: KubePodNotReady, so there are often many similar alerts, differing only in the pod name, namespace, and the cluster they belong to. Previously, these alerts would send 100 (hypothetically) separate notifications to our communication tool. Now, we are using group_by alertname, clusterName, clusterEnv, so alerts in a group are sent only once. However, even after template processing, a single message still includes both resolved and firing alerts, which is obviously not very intuitive.

Now, we want to separate resolved and firing alerts into different messages. This way, we can consolidate 100 messages into 2 messages instead of 1.

DesireWithin avatar Apr 08 '25 11:04 DesireWithin

Before we delve into solutions, could you elaborate on what is your use case for the separation to begin with?

Hello @gotjosh , we have 10 large k8s clusters. We have an alerting rule: KubePodNotReady, so there are often many similar alerts, differing only in the pod name, namespace, and the cluster they belong to. Previously, these alerts would send 100 (hypothetically) separate notifications to our communication tool. Now, we are using group_by alertname, clusterName, clusterEnv, so alerts in a group are sent only once. However, even after template processing, a single message still includes both resolved and firing alerts, which is obviously not very intuitive.

Now, we want to separate resolved and firing alerts into different messages. This way, we can consolidate 100 messages into 2 messages instead of 1.

It's not possible, also please don't ask unrelated questions on GitHub issues. You can get help in the Google Group https://groups.google.com/g/prometheus-users.

grobinson-grafana avatar Apr 10 '25 21:04 grobinson-grafana

It's not possible, also please don't ask unrelated questions on GitHub issues. You can get help in the Google Group https://groups.google.com/g/prometheus-users.

I don't think this is an irrelevant question; I'm just describing a use case.

Regardless of whether Alertmanager believes this should be Prometheus's job (to pass a alert state label) or if one of Alertmanager's policies is to send as few notifications as possible, which leads to this issue not being resolved. But I believe there are many community users encountering this kind of problem, and most of them don’t have an official solution.

Personally, I solved it by writing a simple Go program to separate these alerts.

DesireWithin avatar Apr 15 '25 08:04 DesireWithin

@DesireWithin no offense, but the story you shared didn't contribute anything valuable concerning this ticket: we all have our stories how this or another bug brings us some discomfort/problems, but there's simply no point in sharing such bitter stories if it goes without sharing some workaround solutions, really.

You claim that you wrote a Go program to separate alerts - is it open source? Could you share it with us? We sure would love to see at least some workaround solution for the problem until (and if!) it gets properly fixed.

Drugoy avatar Apr 21 '25 14:04 Drugoy

My original intention was to push this issue forward by providing richer user scenarios, giving decision-makers more reason to reassess this small feature. After all, this issue hasn't seen substantial progress in 5 years, and there's a kind/more-info-needed tag.

I've pushed my code to my personal GitHub. I'm not a professional developer, and while it may not fit your use case, this feature is really quite small. You can use AI to translate the README and comments into your native language.

DesireWithin avatar Apr 23 '25 02:04 DesireWithin

I get the intention, but github doesn't work like a forum where any new post to an older topic would make it rise to the top. If the issue is being ignored for almost 5 years - chances are, the maintainers don't intend to ever fix it...

Drugoy avatar Apr 23 '25 18:04 Drugoy

@Drugoy I have same problem Started to use alertmanager recently and i am suprised that i can not receive different notification for resolved alerts when they are grouped. i have opened new issue: #4721 So we can try to convince them now.

djordjelakicevic-ds avatar Nov 20 '25 14:11 djordjelakicevic-ds

So on a technical level, quite a lot would need to be changed in how the Alertmanager works for this feature to work correctly.

Alertmanager puts alerts into groups, and the Alertmanager sends notifications for groups.

This bit is really important so I want to emphasize it again:

Alertmanager sends notifications for groups.

So what does that mean for grouping by status?

It means that to group by status, alerts must be able to move between groups as their status changes in real time. This is just not possible today.

Here is an example:

  1. Alert is fired at 10:00AM UTC with an ends at of 10:30 UTC.

  2. Alert is still firing at 10:29 UTC.

  3. Alert reaches the ends at time of 10:30 UTC, but it's in the wrong group. It's in the "active" group and not the "resolved" group, so it needs to be removed before the group flushes, otherwise the notification for the active group will include resolved alerts.

It also means in the notification log, which is a replicated log that prevents duplicate notifications in high availability deployments, needs to be redesigned to take into consideration that resolved alerts now move groups. Why? Because if it doesn't, if the alert fires again, it won't send a notification until the next repeat interval, which is incorrect.

In summary, what this means is that two very important components of Alertmanager need to be redesigned to make this feature possible.

grobinson-grafana avatar Nov 20 '25 15:11 grobinson-grafana

@grobinson-grafana So the final answer is that we will not get grouping by status or state and alertmanager will not be able to send separated messages for firing and resolved in grouped alerts?

If that is true then we can finally close this issue and work with what we have.

djordjelakicevic-ds avatar Nov 20 '25 17:11 djordjelakicevic-ds

He just explained what needs to be changed in order to get the requested feature. It doesn't mean it's not possible. It doesn't mean he is against such changes. You are free to contribute with a Merge Request, if you are able to (I know I'm not).

Drugoy avatar Nov 20 '25 18:11 Drugoy

@grobinson-grafana So the final answer is that we will not get grouping by status or state and alertmanager will not be able to send separated messages for firing and resolved in grouped alerts?

An integration can choose to send different messages for firing and resolved notifications if it wishes, but what is not possible is sending firing notifications to one integration and resolved notifications to another integration without quite a considerable redesign of the Alertmanager in my opinion. The same also applies to grouping by state because it is not static, the state of an alert changes over time within the Alertmanager.

grobinson-grafana avatar Nov 21 '25 00:11 grobinson-grafana

@Drugoy I am not able also. And i am not sure if someone is able because you opened issue 5 years ago :D

@grobinson-grafana Thanks for clear clarification.

djordjelakicevic-ds avatar Nov 21 '25 07:11 djordjelakicevic-ds