alertmanager
alertmanager copied to clipboard
Duplicate alerts in UI
Hello,
Since I added multiple routes (with a continue: true
) I see my alerts in double in the UI of alertmanager (version 0.20.0)
I find old bugs, but if I understand well that should be fixed already
Edit: If I remove the continue: true
they are not duplicated
global:
resolve_timeout: 5m
smtp_from: '[email protected]'
smtp_smarthost: 'smtpintern.example.com:25'
smtp_require_tls: false
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'email.all'
routes:
- receiver: 'email.all'
continue: true
- receiver: 'web.Hangout'
receivers:
- name: 'web.Hangout'
webhook_configs:
- url: 'http://localhost:6000/create?room_name=Prometheusalerts'
- name: 'email.all'
email_configs:
- to: '[email protected]'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
I agree it's confusing but at the same time, it's kind of expected since alerts matche the 2 groups. The UI could be improved to indicate that the groups are for different receivers.
Side-note, instead of using continue: true
you can configure multiple integrations within one receiver like this:
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'receiverA'
receivers:
- name: 'receiverA'
webhook_configs:
- url: 'http://localhost:6000/create?room_name=Prometheusalerts'
email_configs:
- to: '[email protected]'
I could indeed use that, but now I expanded my route to the following and grouping the receivers like that would not work I guess
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'email.all'
routes:
- receiver: 'email.all'
continue: true
- receiver: 'web.HangoutPRD'
match:
environment: 'PRD'
- receiver: 'web.Hangout'
You can still avoid continue: true
and avoid duplication with YAML anchors.
route:
receiver: 'notPRD'
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
routes:
- receiver: 'PRD'
match:
environment: 'PRD'
receivers:
- name: 'PRD'
webhook_configs:
- url: 'http://localhost:6000/create?room_name=PRD'
email_configs: &email-all
- to: '[email protected]'
- name: 'notPRD'
webhook_configs:
- url: 'http://localhost:6000/create?room_name=Prometheusalerts'
email_configs: *email-all
Thanks
Lille remark it seems that the .
in email.all
is making the parser unhappy
@bigon right, example updated :)
As my colleague @joe-elliott just ran into the same issue, I thought a bit about it:
It is confusing that the UI doesn't mention the receiver anywhere. You just see two completely identical groups because of that. Even the UI itself is confused by it: If I click the Info button on one group, both groups expand.
Perhaps it would be good to show the receiver somewhere. Or to dedup completely identical groups.
Perhaps it would be good to show the receiver somewhere.
It was my idea too.
I discovered the other day that there is a drop-down menu (next to "Silenced" checkbox) to limit the receivers that are shown, this is definitely not obvious
We just hit this issue attempting to upgrade. The duplication did not happen with the version we are on now (0.12.0) with the "all" receiver selected. I agree that this is very confusing and think the grouping should be merged when the "all" receiver is selected.
I'll add to the choir - just spent way too much time figuring this out. Additionally, the 'Receiver' drop-down doesn't scroll down to show you all of the receivers - but you can type and it will auto-complete.
I recently implemented the routing technique @roidelapluie described in https://promcon.io/2019-munich/talks/improved-alerting-with-prometheus-and-alertmanager/ and this issue lead to a lot of head scratching.
I understand why now, but it's not obvious.
We just ran into the same issue. It is really really not obvious why there is a duplicate in this case. There should definitely be a hint to the receiver in the UI.
I can also confirm this is a problem for us. As others have said it is not obvious why this is the case. Looking forward to a fix!
I am fixing this in #3289