alertmanager
alertmanager copied to clipboard
duplicate repeated notifications
While developing some tooling to load test Alertmanager HA clusters, I believe to have found a problem regarding repeated notifications.
The tooling I've built captures the alerts it fires and the resulting notifications as events.
Alerts fired events start the line with ALERTS
followed by the timestamp of the event, the Alertmanager instance it was fired against and finishes with a list of alerts represented by their hash.
Notifications received start the line with NOTIFICATION
followed by the timestamp, the Alertmanager that sent the notification, the group key, a hash of all alerts in the notification followed by a list of all alerts part of the notification.
The Alertmanager configuration used is:
global:
resolve_timeout: 60m
route:
group_by: ['__name__']
group_wait: 5s
group_interval: 5s
repeat_interval: 10s
receiver: 'webhook'
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://127.0.0.1:8080/notify'
send_resolved: false
The "alerts" I'm sending are just labelsets that I've parsed out of an anonymized set of metrics of a Prometheus metric output, primarily because the dataset existed and is large, but that is why I chose to group by __name__
.
The test that I ran looks like this: a single worker sending a single alert every second and every 5 seconds switches to the next alert, this is run for 1 minute + 10 seconds to capture remaining notifications generated from the last alerts fired.
The result was the following (in order to be able to reference line numbers I've also created a gist):
ALERTS 2017-09-28T12:10:37.798194207Z http://localhost:9093/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:37.799183394Z http://localhost:9094/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:37.800069114Z http://localhost:9095/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:38.797270816Z http://localhost:9093/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:38.798020242Z http://localhost:9094/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:38.800534441Z http://localhost:9095/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:39.800521846Z http://localhost:9093/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:39.801290913Z http://localhost:9094/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:39.802074350Z http://localhost:9095/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:40.797564108Z http://localhost:9093/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:40.798459262Z http://localhost:9094/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:40.799239927Z http://localhost:9095/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:41.797429986Z http://localhost:9093/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:41.798258191Z http://localhost:9094/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:41.799001280Z http://localhost:9095/api/v1/alerts 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:42.797232675Z http://localhost:9093/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:42.797934746Z http://localhost:9094/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:42.798619634Z http://localhost:9095/api/v1/alerts 7e7f12e3cdafe518
NOTIFICATION 2017-09-28T12:10:42.800983970Z http://localhost.localdomain:9095 {}:{__name__="oiwwpnxbzvnglxqfmmgydouluripxyalq"} 1aed0cc371e7fbcf 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:43.797446311Z http://localhost:9093/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:43.798220343Z http://localhost:9094/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:43.798836356Z http://localhost:9095/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:44.798056203Z http://localhost:9093/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:44.799502208Z http://localhost:9094/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:44.800750221Z http://localhost:9095/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:45.798191810Z http://localhost:9093/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:45.799568964Z http://localhost:9094/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:45.800585195Z http://localhost:9095/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:46.797094850Z http://localhost:9093/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:46.797996715Z http://localhost:9094/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:46.798959840Z http://localhost:9095/api/v1/alerts 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:47.797074854Z http://localhost:9093/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:47.797671370Z http://localhost:9094/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:47.798175166Z http://localhost:9095/api/v1/alerts bd89880863e2a021
NOTIFICATION 2017-09-28T12:10:47.799082461Z http://localhost.localdomain:9095 {}:{__name__="vxezaawdsdwcvvuvryyabvkvbgdqlcqstgddkefmpdrjp"} 7e7f12e3cdafe518 7e7f12e3cdafe518
ALERTS 2017-09-28T12:10:48.797726244Z http://localhost:9093/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:48.798648053Z http://localhost:9094/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:48.799661829Z http://localhost:9095/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:49.797847640Z http://localhost:9093/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:49.798974507Z http://localhost:9094/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:49.800244041Z http://localhost:9095/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:50.797584380Z http://localhost:9093/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:50.798515603Z http://localhost:9094/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:50.799385669Z http://localhost:9095/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:51.797519306Z http://localhost:9093/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:51.798570709Z http://localhost:9094/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:51.799592098Z http://localhost:9095/api/v1/alerts bd89880863e2a021
ALERTS 2017-09-28T12:10:52.798130921Z http://localhost:9093/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:52.799423295Z http://localhost:9094/api/v1/alerts c7244ae83ae3fea1
NOTIFICATION 2017-09-28T12:10:52.799699291Z http://localhost.localdomain:9095 {}:{__name__="viljjpdisdmychdciattjgryfsxgkinrxuwzkplzqvzydyod"} bd89880863e2a021 bd89880863e2a021
ALERTS 2017-09-28T12:10:52.800667527Z http://localhost:9095/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:53.797671303Z http://localhost:9093/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:53.798821237Z http://localhost:9094/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:53.799933295Z http://localhost:9095/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:54.797521231Z http://localhost:9093/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:54.798485334Z http://localhost:9094/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:54.799259402Z http://localhost:9095/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:55.798953325Z http://localhost:9093/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:55.800176411Z http://localhost:9094/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:55.801245167Z http://localhost:9095/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:56.797100473Z http://localhost:9093/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:56.797971844Z http://localhost:9094/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:56.798780060Z http://localhost:9095/api/v1/alerts c7244ae83ae3fea1
ALERTS 2017-09-28T12:10:57.798235230Z http://localhost:9093/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:10:57.799724609Z http://localhost:9094/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:10:57.801189828Z http://localhost:9095/api/v1/alerts 52fb295b6100a4e8
NOTIFICATION 2017-09-28T12:10:57.801620247Z http://localhost.localdomain:9095 {}:{__name__="oiwwpnxbzvnglxqfmmgydouluripxyalq"} 1aed0cc371e7fbcf 1aed0cc371e7fbcf
NOTIFICATION 2017-09-28T12:10:57.802131137Z http://localhost.localdomain:9095 {}:{__name__="effdxjecmjwlwywayerjlkbuuzqivrpucvqgqkwoqvnfgxvccl"} c7244ae83ae3fea1 c7244ae83ae3fea1
NOTIFICATION 2017-09-28T12:10:57.803794835Z http://localhost.localdomain:9093 {}:{__name__="oiwwpnxbzvnglxqfmmgydouluripxyalq"} 1aed0cc371e7fbcf 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:10:58.797602698Z http://localhost:9093/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:10:58.798599639Z http://localhost:9094/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:10:58.799491324Z http://localhost:9095/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:10:59.798163002Z http://localhost:9093/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:10:59.799861010Z http://localhost:9094/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:10:59.801677640Z http://localhost:9095/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:11:00.797094927Z http://localhost:9093/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:11:00.797959909Z http://localhost:9094/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:11:00.798928577Z http://localhost:9095/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:11:01.797341069Z http://localhost:9093/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:11:01.798137475Z http://localhost:9094/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:11:01.798849095Z http://localhost:9095/api/v1/alerts 52fb295b6100a4e8
ALERTS 2017-09-28T12:11:02.797696805Z http://localhost:9093/api/v1/alerts 1af4ca728e342bc6
NOTIFICATION 2017-09-28T12:11:02.799762683Z http://localhost.localdomain:9093 {}:{__name__="vxezaawdsdwcvvuvryyabvkvbgdqlcqstgddkefmpdrjp"} 7e7f12e3cdafe518 7e7f12e3cdafe518
ALERTS 2017-09-28T12:11:02.800370351Z http://localhost:9094/api/v1/alerts 1af4ca728e342bc6
NOTIFICATION 2017-09-28T12:11:02.801080811Z http://localhost.localdomain:9095 {}:{__name__="vxezaawdsdwcvvuvryyabvkvbgdqlcqstgddkefmpdrjp"} 7e7f12e3cdafe518 7e7f12e3cdafe518
NOTIFICATION 2017-09-28T12:11:02.802088282Z http://localhost.localdomain:9095 {}:{__name__="zzlqunurqsnprexlidrmgppwemgbhzyigbfgqiyedzsueibqu"} 52fb295b6100a4e8 52fb295b6100a4e8
ALERTS 2017-09-28T12:11:02.802431645Z http://localhost:9095/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:03.797531911Z http://localhost:9093/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:03.798497116Z http://localhost:9094/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:03.799472119Z http://localhost:9095/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:04.797435020Z http://localhost:9093/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:04.798329189Z http://localhost:9094/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:04.799134023Z http://localhost:9095/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:05.797141301Z http://localhost:9093/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:05.797893663Z http://localhost:9094/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:05.798607056Z http://localhost:9095/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:06.797640203Z http://localhost:9093/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:06.798508598Z http://localhost:9094/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:06.799346420Z http://localhost:9095/api/v1/alerts 1af4ca728e342bc6
ALERTS 2017-09-28T12:11:07.797450473Z http://localhost:9093/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:07.798437679Z http://localhost:9094/api/v1/alerts b00fd90c0a5af067
NOTIFICATION 2017-09-28T12:11:07.798913473Z http://localhost.localdomain:9093 {}:{__name__="viljjpdisdmychdciattjgryfsxgkinrxuwzkplzqvzydyod"} bd89880863e2a021 bd89880863e2a021
ALERTS 2017-09-28T12:11:07.799909084Z http://localhost:9095/api/v1/alerts b00fd90c0a5af067
NOTIFICATION 2017-09-28T12:11:07.800087283Z http://localhost.localdomain:9095 {}:{__name__="viljjpdisdmychdciattjgryfsxgkinrxuwzkplzqvzydyod"} bd89880863e2a021 bd89880863e2a021
NOTIFICATION 2017-09-28T12:11:07.800997761Z http://localhost.localdomain:9095 {}:{__name__="vxezaawdsdwcvvuvryyabvkvbgdqlcqstgddkefmpdrjp"} 648bd891439bcede 7e7f12e3cdafe518 1af4ca728e342bc6
NOTIFICATION 2017-09-28T12:11:07.801192328Z http://localhost.localdomain:9093 {}:{__name__="vxezaawdsdwcvvuvryyabvkvbgdqlcqstgddkefmpdrjp"} 648bd891439bcede 7e7f12e3cdafe518 1af4ca728e342bc6
NOTIFICATION 2017-09-28T12:11:07.805756040Z http://localhost.localdomain:9093 {}:{__name__="oiwwpnxbzvnglxqfmmgydouluripxyalq"} 1aed0cc371e7fbcf 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:11:08.798768526Z http://localhost:9093/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:08.800274909Z http://localhost:9094/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:08.801675108Z http://localhost:9095/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:09.797689818Z http://localhost:9093/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:09.798762672Z http://localhost:9094/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:09.799715877Z http://localhost:9095/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:10.797560597Z http://localhost:9093/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:10.798576137Z http://localhost:9094/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:10.800462299Z http://localhost:9095/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:11.797450524Z http://localhost:9093/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:11.798319907Z http://localhost:9094/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:11.799047679Z http://localhost:9095/api/v1/alerts b00fd90c0a5af067
ALERTS 2017-09-28T12:11:12.797160321Z http://localhost:9093/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:12.797774505Z http://localhost:9094/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:12.798617466Z http://localhost:9095/api/v1/alerts 8e4acded38d67e85
NOTIFICATION 2017-09-28T12:11:12.800176434Z http://localhost.localdomain:9095 {}:{__name__="njxugstzcglxwexppqfurzsxezpqvxjjded"} b00fd90c0a5af067 b00fd90c0a5af067
NOTIFICATION 2017-09-28T12:11:12.800200501Z http://localhost.localdomain:9093 {}:{__name__="effdxjecmjwlwywayerjlkbuuzqivrpucvqgqkwoqvnfgxvccl"} c7244ae83ae3fea1 c7244ae83ae3fea1
ALERTS 2017-09-28T12:11:13.798161225Z http://localhost:9093/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:13.799461912Z http://localhost:9094/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:13.800693146Z http://localhost:9095/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:14.797405230Z http://localhost:9093/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:14.798274577Z http://localhost:9094/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:14.799742374Z http://localhost:9095/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:15.797509504Z http://localhost:9093/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:15.798281623Z http://localhost:9094/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:15.799002244Z http://localhost:9095/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:16.797983761Z http://localhost:9093/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:16.799050168Z http://localhost:9094/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:16.800111599Z http://localhost:9095/api/v1/alerts 8e4acded38d67e85
ALERTS 2017-09-28T12:11:17.797050083Z http://localhost:9093/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:17.797719643Z http://localhost:9094/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:17.798317704Z http://localhost:9095/api/v1/alerts c1acb3495e10a2
NOTIFICATION 2017-09-28T12:11:17.799264901Z http://localhost.localdomain:9095 {}:{__name__="hnzpguwyghtdrzcqdzwad"} 8e4acded38d67e85 8e4acded38d67e85
NOTIFICATION 2017-09-28T12:11:17.799505922Z http://localhost.localdomain:9093 {}:{__name__="zzlqunurqsnprexlidrmgppwemgbhzyigbfgqiyedzsueibqu"} 52fb295b6100a4e8 52fb295b6100a4e8
NOTIFICATION 2017-09-28T12:11:17.802368745Z http://localhost.localdomain:9093 {}:{__name__="vxezaawdsdwcvvuvryyabvkvbgdqlcqstgddkefmpdrjp"} 648bd891439bcede 7e7f12e3cdafe518 1af4ca728e342bc6
NOTIFICATION 2017-09-28T12:11:17.806628738Z http://localhost.localdomain:9093 {}:{__name__="oiwwpnxbzvnglxqfmmgydouluripxyalq"} 1aed0cc371e7fbcf 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:11:18.797415416Z http://localhost:9093/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:18.798154502Z http://localhost:9094/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:18.798871241Z http://localhost:9095/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:19.797547252Z http://localhost:9093/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:19.798858978Z http://localhost:9094/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:19.800230178Z http://localhost:9095/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:20.797779028Z http://localhost:9093/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:20.799083238Z http://localhost:9094/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:20.800423535Z http://localhost:9095/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:21.797995782Z http://localhost:9093/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:21.799232447Z http://localhost:9094/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:21.800423226Z http://localhost:9095/api/v1/alerts c1acb3495e10a2
ALERTS 2017-09-28T12:11:22.797141634Z http://localhost:9093/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:22.797942369Z http://localhost:9094/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:22.798616673Z http://localhost:9095/api/v1/alerts 4639f01bcd88a94a
NOTIFICATION 2017-09-28T12:11:22.798907736Z http://localhost.localdomain:9095 {}:{__name__="kvqnqvendriqjccoxlekdkgacndbsnovmart"} c1acb3495e10a2 c1acb3495e10a2
NOTIFICATION 2017-09-28T12:11:22.799878579Z http://localhost.localdomain:9094 {}:{__name__="viljjpdisdmychdciattjgryfsxgkinrxuwzkplzqvzydyod"} bd89880863e2a021 bd89880863e2a021
NOTIFICATION 2017-09-28T12:11:22.800155402Z http://localhost.localdomain:9095 {}:{__name__="viljjpdisdmychdciattjgryfsxgkinrxuwzkplzqvzydyod"} bd89880863e2a021 bd89880863e2a021
NOTIFICATION 2017-09-28T12:11:22.800226625Z http://localhost.localdomain:9093 {}:{__name__="viljjpdisdmychdciattjgryfsxgkinrxuwzkplzqvzydyod"} bd89880863e2a021 bd89880863e2a021
NOTIFICATION 2017-09-28T12:11:22.800984398Z http://localhost.localdomain:9095 {}:{__name__="njxugstzcglxwexppqfurzsxezpqvxjjded"} b00fd90c0a5af067 b00fd90c0a5af067
NOTIFICATION 2017-09-28T12:11:22.801429645Z http://localhost.localdomain:9093 {}:{__name__="effdxjecmjwlwywayerjlkbuuzqivrpucvqgqkwoqvnfgxvccl"} c7244ae83ae3fea1 c7244ae83ae3fea1
NOTIFICATION 2017-09-28T12:11:22.802082100Z http://localhost.localdomain:9095 {}:{__name__="effdxjecmjwlwywayerjlkbuuzqivrpucvqgqkwoqvnfgxvccl"} c7244ae83ae3fea1 c7244ae83ae3fea1
ALERTS 2017-09-28T12:11:23.797362116Z http://localhost:9093/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:23.798340936Z http://localhost:9094/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:23.799126613Z http://localhost:9095/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:24.797276171Z http://localhost:9093/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:24.798113499Z http://localhost:9094/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:24.799086892Z http://localhost:9095/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:25.797799754Z http://localhost:9093/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:25.799005231Z http://localhost:9094/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:25.800244586Z http://localhost:9095/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:26.797174189Z http://localhost:9093/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:26.797882146Z http://localhost:9094/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:26.798485464Z http://localhost:9095/api/v1/alerts 4639f01bcd88a94a
ALERTS 2017-09-28T12:11:27.798229822Z http://localhost:9093/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:27.800011256Z http://localhost:9094/api/v1/alerts a1bebc050a7fc94f
NOTIFICATION 2017-09-28T12:11:27.800128921Z http://localhost.localdomain:9095 {}:{__name__="mjwylwbywspwjuygvvlfzgqgkdgozcpfpvwqnwilfusr"} 4639f01bcd88a94a 4639f01bcd88a94a
NOTIFICATION 2017-09-28T12:11:27.800508342Z http://localhost.localdomain:9095 {}:{__name__="hnzpguwyghtdrzcqdzwad"} 8e4acded38d67e85 8e4acded38d67e85
NOTIFICATION 2017-09-28T12:11:27.801592165Z http://localhost.localdomain:9093 {}:{__name__="zzlqunurqsnprexlidrmgppwemgbhzyigbfgqiyedzsueibqu"} 52fb295b6100a4e8 52fb295b6100a4e8
ALERTS 2017-09-28T12:11:27.802008920Z http://localhost:9095/api/v1/alerts a1bebc050a7fc94f
NOTIFICATION 2017-09-28T12:11:27.803019601Z http://localhost.localdomain:9095 {}:{__name__="zzlqunurqsnprexlidrmgppwemgbhzyigbfgqiyedzsueibqu"} 52fb295b6100a4e8 52fb295b6100a4e8
NOTIFICATION 2017-09-28T12:11:27.803533788Z http://localhost.localdomain:9093 {}:{__name__="vxezaawdsdwcvvuvryyabvkvbgdqlcqstgddkefmpdrjp"} 648bd891439bcede 7e7f12e3cdafe518 1af4ca728e342bc6
NOTIFICATION 2017-09-28T12:11:27.807676598Z http://localhost.localdomain:9093 {}:{__name__="oiwwpnxbzvnglxqfmmgydouluripxyalq"} 1aed0cc371e7fbcf 1aed0cc371e7fbcf
ALERTS 2017-09-28T12:11:28.797078190Z http://localhost:9093/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:28.797782525Z http://localhost:9094/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:28.798762787Z http://localhost:9095/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:29.797523607Z http://localhost:9093/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:29.798469251Z http://localhost:9094/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:29.799279044Z http://localhost:9095/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:30.797526669Z http://localhost:9093/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:30.798474976Z http://localhost:9094/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:30.799320611Z http://localhost:9095/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:31.797399035Z http://localhost:9093/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:31.798510986Z http://localhost:9094/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:31.799491676Z http://localhost:9095/api/v1/alerts a1bebc050a7fc94f
ALERTS 2017-09-28T12:11:32.797805946Z http://localhost:9093/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:32.799277870Z http://localhost:9094/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:32.800275768Z http://localhost:9095/api/v1/alerts f759b670b1c31096
NOTIFICATION 2017-09-28T12:11:32.802986545Z http://localhost.localdomain:9095 {}:{__name__="krzztrbrvnvemygzedveprkgyxplsbbznvrq"} a1bebc050a7fc94f a1bebc050a7fc94f
NOTIFICATION 2017-09-28T12:11:32.802998020Z http://localhost.localdomain:9095 {}:{__name__="effdxjecmjwlwywayerjlkbuuzqivrpucvqgqkwoqvnfgxvccl"} 307dfc988b20ee37 c7244ae83ae3fea1 f759b670b1c31096
ALERTS 2017-09-28T12:11:33.797532933Z http://localhost:9093/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:33.798550985Z http://localhost:9094/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:33.799445253Z http://localhost:9095/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:34.797136900Z http://localhost:9093/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:34.798176352Z http://localhost:9094/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:34.799152471Z http://localhost:9095/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:35.798143531Z http://localhost:9093/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:35.799774171Z http://localhost:9094/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:35.801246510Z http://localhost:9095/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:36.797239527Z http://localhost:9093/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:36.798015926Z http://localhost:9094/api/v1/alerts f759b670b1c31096
ALERTS 2017-09-28T12:11:36.798760116Z http://localhost:9095/api/v1/alerts f759b670b1c31096
NOTIFICATION 2017-09-28T12:11:37.798437168Z http://localhost.localdomain:9093 {}:{__name__="kvqnqvendriqjccoxlekdkgacndbsnovmart"} c1acb3495e10a2 c1acb3495e10a2
NOTIFICATION 2017-09-28T12:11:37.799387696Z http://localhost.localdomain:9093 {}:{__name__="njxugstzcglxwexppqfurzsxezpqvxjjded"} b00fd90c0a5af067 b00fd90c0a5af067
NOTIFICATION 2017-09-28T12:11:37.800270884Z http://localhost.localdomain:9095 {}:{__name__="viljjpdisdmychdciattjgryfsxgkinrxuwzkplzqvzydyod"} bd89880863e2a021 bd89880863e2a021
NOTIFICATION 2017-09-28T12:11:37.804813077Z http://localhost.localdomain:9093 {}:{__name__="vxezaawdsdwcvvuvryyabvkvbgdqlcqstgddkefmpdrjp"} 648bd891439bcede 7e7f12e3cdafe518 1af4ca728e342bc6
NOTIFICATION 2017-09-28T12:11:37.808742736Z http://localhost.localdomain:9093 {}:{__name__="oiwwpnxbzvnglxqfmmgydouluripxyalq"} 1aed0cc371e7fbcf 1aed0cc371e7fbcf
NOTIFICATION 2017-09-28T12:11:42.798819444Z http://localhost.localdomain:9093 {}:{__name__="mjwylwbywspwjuygvvlfzgqgkdgozcpfpvwqnwilfusr"} 4639f01bcd88a94a 4639f01bcd88a94a
NOTIFICATION 2017-09-28T12:11:42.799761007Z http://localhost.localdomain:9093 {}:{__name__="hnzpguwyghtdrzcqdzwad"} 8e4acded38d67e85 8e4acded38d67e85
NOTIFICATION 2017-09-28T12:11:42.800567231Z http://localhost.localdomain:9095 {}:{__name__="hnzpguwyghtdrzcqdzwad"} 8e4acded38d67e85 8e4acded38d67e85
NOTIFICATION 2017-09-28T12:11:42.801940804Z http://localhost.localdomain:9094 {}:{__name__="zzlqunurqsnprexlidrmgppwemgbhzyigbfgqiyedzsueibqu"} 52fb295b6100a4e8 52fb295b6100a4e8
We can see that first notifications go out only once, as expected, however when those notifications get repeated, for example on line 67 to 69, they are sent simultaneously by two different Alertmanagers. This happens multiple times throughout the test, sometimes even all three Alertmanager instances send the notification.
I have yet to validate my suspicion in the code, however, I believe this is happening because while for initial "group wait" the Alertmanager instances wait an additional member position in mesh times 5 seconds, I believe they don't have such behavior for repeating notifications, and they race between who manages to sent out and gossip and de-duplicates, which results in sometimes the notifications being de-duplicated and sometimes not.
Let me know what you think and whether anything I said doesn't make any sense :slightly_smiling_face:.
I'll be sharing the tooling I'm building at some point, but it's very early stage and constantly changing right now, but I'm hoping it can become a useful tool to simulate scenarios for HA clusters.
@stuartnelson3 @fabxc (feel free to tag anyone who might be interested as well)
Thanks for working on this!
Without having looked at any code to check, your initial hypothesis for repeated notifications not waiting (position*5sec) does sound like it could be the culprit.
What is your priority currently between further refining the tooling and trying to solve this (and probably other) bugs? Are you focused on finishing the tooling and then addressing the bugs, or looking at the bugs as they're discovered?
Regarding the tooling I think it'll go hand in hand with the tests we are developing and the bugs we find. The tooling can be shared as is, I just feel at this point it's more confusing then helping at the rate of change happening to it right now (I literally started the first lines of code a few days ago). I should be able to alternate between developing the tooling and fixing the discovered bugs found with it. As in in order to find the more complex bugs the tooling probably needs to get better.
@brancz Think this is related to my last comment on the issue https://github.com/prometheus/alertmanager/issues/964.
Prometheus will call Alertmanager for the same issue a couple times. Supposed that 1 call was made, the AMs sync it and send only 1 notification, but prometheus keep calling the AMs a few other times, and as you configured the repeat_interval
to a very small time they start resending those notifications because of it.
Does that make any sense?
The same thing happens when the repeat_interval
is larger. The problem is that once the first notification was sent out all Alertmanager instances will wait until last notification time + repeat_interval
, which will make them race against each other regardless of the value of repeat_interval
.
I should soon be able to dig into this issue further.
@brancz Only one single alertmanager without high available also has the same problem. alertmanager version: 0.8.0 resolve_timeout: 30m
with repeat_interval: 2m resolved message send 6~7 times, with repeat_interval: 7m resolved message send ~3 times.
@brancz the "duplicate repeated notifications" should because the prometheus send notification to alertmanager continuously for 15 mins. So if the notification send interval less than 15mins, it will send duplicate repeated? and the send interval time is group_interval + repeat_interval? https://github.com/prometheus/prometheus/blob/master/rules/alerting.go#L151-L156
And it's hardcode. It should be better make it configurable. or change the 15mins to 5mins or 10 mins, 15mins is too long in production.
I also have same problem with HA! 2 Prometheus + 2 AM
I have 3 receivers, jira over webhook, slack and pushover.
Alertmanager config for both instances is:
global:
resolve_timeout: 5m
route:
receiver: jira
group_by:
- severity
routes:
- receiver: jira
match:
severity: Lowest
continue: true
- receiver: jira
match:
severity: Low
continue: true
- receiver: slack
match:
severity: Low
continue: true
- receiver: jira
match:
severity: High
continue: true
- receiver: slack
match:
severity: High
continue: true
- receiver: jira
match:
severity: Highest
continue: true
- receiver: slack
match:
severity: Highest
continue: true
- receiver: push
match:
severity: Highest
continue: true
group_wait: 1m
group_interval: 15m
repeat_interval: 1h
receivers:
- name: slack
slack_configs:
- send_resolved: false
api_url: <secret>
channel: '#prometheus'
username: '{{ template "slack.default.username" . }}'
color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'
title: '{{ .CommonAnnotations.SUMMARY }}'
title_link: '{{ template "slack.default.titlelink" . }}'
pretext: '{{ template "slack.default.pretext" . }}'
text: '{{ .CommonAnnotations.DESCRIPTION }}'
fallback: '{{ template "slack.default.fallback" . }}'
icon_emoji: '{{ template "slack.default.iconemoji" . }}'
icon_url: '{{ template "slack.default.iconurl" . }}'
- name: push
pushover_configs:
- send_resolved: true
user_key: <secret>
token: <secret>
title: '{{ template "pushover.default.title" . }}'
message: '{{ template "pushover.default.message" . }}'
url: '{{ template "pushover.default.url" . }}'
priority: '{{ if eq .Status "firing" }}2{{ else }}0{{ end }}'
retry: 1m0s
expire: 1h0m0s
- name: jira
webhook_configs:
- send_resolved: false
url: http://127.0.0.1:8000
templates:
- /etc/alertmanager/default.tmpl
Pushover and Slack works good don't send duplicate, but problem is in the webhook, because is that post request. When AM send notification, both AM send many post requests over webhook:
AM1:
127.0.0.1 - - [27/Oct/2017 13:38:42] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [27/Oct/2017 13:38:45] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [27/Oct/2017 13:38:47] "POST / HTTP/1.1" 200 -
....
AM2:
127.0.0.1 - - [27/Oct/2017 13:38:37] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [27/Oct/2017 13:38:39] "POST / HTTP/1.1" 200 -
127.0.0.1 - - [27/Oct/2017 13:38:42] "POST / HTTP/1.1" 200 -
....
EDIT:
I try to disable HA and configure only one Prometheus and Alertmanager but problem is same !
I've played a bit with ambench and indeed I see the same issue using the master branch. Digging further into the code (and with the help of additional traces), I can explain what happens.
Assuming 3 AlertManager instances where am1, am2 and am3 have respectively the position 0, 1, 2 in the cluster. I've set the --cluster.peerTimeout
value to 5s meaning that before actually flushing an alert group, am2 waits an extra 5s interval and am3 10s (this is the WaitStage).
The AM configuration is:
global:
resolve_timeout: 60m
route:
group_by: ['__name__']
group_wait: 5s
group_interval: 20s
repeat_interval: 40s
receiver: 'webhook'
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://127.0.0.1:8080/notify'
send_resolved: false
- t=-5s, an alert fires creating a new alert group on all AlertManager.
- t=0s, am1, am2, am3 flush the notification. am2 and am3 wait respectively for 5s and 10s before effectively sending the notification.
- t=0s, am1 sends the notification and waits for another 20s (group_interval).
- t=0s++, the alert group gets another alert from ambench. Since the flush call hasn't finished on am2 and am3, both reset the next timer to zero.
- t=5s, am2 sees that am1 has sent the notification. Since the timer has been reset at the step 4, it resets the timer for another 20s (group_interval) and flushes the group.
- t=10s, am3 sees that am1 has sent the notification. Since the timer has been reset at the step 4, it resets the timer for another 20s (group_interval) and flushes the group.
- t=20s, am1 wakes up, resets the timer for another 20s and flushes the group.
- t=25s, am2 wakes up, resets the timer for another 20s and flushes the group (including a 5s wait).
- t=30s, am3 wakes up, resets the timer for another 20s and flushes the group (including a 10s wait).
At t=40s, am1 flushes the notification and sends it to the receiver because repeat_interval is over. Almost at the same time, am3 exits from its 10s WaitStage and it will race with am1 to send the notification too.
The same timeline but with a drawing:
The bold vertical bars are the flush calls and the green boxes represent the WaitStage intervals.
I've got a couple of ideas to reduce the likelihood of this scenario but it won't probably cover all the cases.
I had the same problem, but this occurs only when the configuration file is reloaded when the alarm is sent