alertmanager
alertmanager copied to clipboard
change the deduplication id for the sns receiver
Change the deduplication id to use groupKey + now from the context. now is generated from https://github.com/prometheus/alertmanager/blob/main/dispatch/dispatch.go#L442 which should be different from each flush.
This change should be fix the following cases:
- The users are setting repeat_interval to be less than 5m, the messages are getting deduplicated by the SNS even though users want to receive the message less than 5m interval
- The users unable to receive the message from the alerts get resolved in 5m
I suspect that this will break alertmanager HA functionnality, as Now will not be unique accross the cluster.
I suspect that this will break alertmanager HA functionnality, as Now will not be unique accross the cluster.
I'm not sure this would be the case, the dedup stage (where we look at the notification log) happens before the retry stage (where we execute the Notify
function) - by the time we get to use now
, we've already determined that we need to notify.
https://github.com/prometheus/alertmanager/blob/a38c5b8f1d780ce042a53a217af8c56316ed3071/notify/notify.go#L359-L365
In principle, the change seems safe. WDYT @roidelapluie?
@qinxx108 is there any chance you can provide us with a test account? I feel like to review/approve this change we'd need to test it against SNS.
@roidelapluie @gotjosh Based on our previous discussion, Wonder if we have a chance to test this out? Thanks a lot for the help!
I've tested this by creating a receiver that has both a webhook and SNS config with a repeat interval of 1m.
For the webhook, I get new webhooks every minute and for SNS I get new message_id
and sequence
for each send. This was not the same when I tried it out without this change.
ts=2022-06-30T16:34:08.710Z caller=sns.go:94 level=debug integration=sns msg="SNS message successfully published" message_id=ea06867b-379c-5469-8c6c-dd4ce55e260a sequencenumber=10000000000000019000
ts=2022-06-30T16:35:38.416Z caller=sns.go:94 level=debug integration=sns msg="SNS message successfully published" message_id=06b00bb4-0eb2-5913-acc2-5083ddae2675 sequencenumber=10000000000000020000