alertmanager
alertmanager copied to clipboard
Alert grouping test with amtool
What did you do?
Now we are activly using amtool config routes test
and find it extremely usefull, but recently found that we should check if alert grouping is expected too.
for example
now we are checking that
% amtool config routes test --config.file alertmanager.yaml --tree \
--verify.receivers wire-team-opsgenie 'team=wire'
Matching routes:
.
└── default-route
└── {team=~"^(?:^(wire)$)$"} receiver: wire-team-opsgenie
wire-team-opsgenie
it will be usefull if we can pass something like
% amtool config routes test --config.file alertmanager.yaml \
--tree --verify.receivers wire-team-opsgenie \
--verify.grouping=env,cluster,priority 'team=wire'
Matching routes:
.
└── default-route
└── {team=~"^(?:^(wire)$)$"} receiver: wire-team-opsgenie
wire-team-opsgenie, grouping: [env,cluster,priority]
I'm not sure I follow the usefulness of this - on your example where you include the grouping, what changed?
the main reason of it is for routing with custom subroutes. for example i have something like
- receiver: wire-team-opsgenie
group_by:
- env
- cluster
- priority
match_re:
team: ^(wire)$
routes:
- receiver: wire-team-opsgenie
group_by:
- alertname
- cve
- cluster
match:
alert_topic: security
- receiver: wire-team-opsgenie
group_by:
- alertname
- service
- project
- team
match:
alertname: QuotaCanBeReached
You can see each alert will be sent to same receiver but with different grouping. After opsgenie we create jira issue and alert grouping is a key to know we already had the same incident previously. So instead of opening new jira issue we can append to already created. That is why its crucial to check if grouping is correct when changing am configs.
i propose two things
- show reciever grouping when displaying routing tree may be here https://github.com/prometheus/alertmanager/blob/main/cli/routing.go#L89
{team=~"^(?:^(wire)$)$"} receiver: wire-team-opsgenie
wire-team-opsgenie, *grouping: [env,cluster,priority]*
- add new key
verify.grouping
that can check if receiver got expected grouping. maybe something like--verify.grouping[0]=[alertname,cve,cluster]
will do the trick