consul-alerts
consul-alerts copied to clipboard
Spurious Notifications
I've recently created a Notification Profile to direct serfHealth alerts to email and have since been getting a lot of "System is HEALTHY" without the corresponding "System is CRITICAL" emails beforehand.
I enabled debug logging a few days ago and an excerpt is below. You can see that the node was critical at one point but never triggered an alert. When it became stable, after 90 seconds an email was sent. We've never had this problem with PagerDuty, presumedly because it wouldn't have an incident to resolve.
INFO[143330] Registering new health check: node=ip-10-0-204-253, service=, check=Serf Health Status, status=critical
INFO[143350] ip-10-0-204-253::Serf Health Status is pending status change from to critical for 20.343651277s.
INFO[143360] ip-10-0-204-253::Serf Health Status is now pending status change from to passing.
INFO[143380] ip-10-0-204-253::Serf Health Status is now pending status change from to critical.
INFO[143400] ip-10-0-204-253::Serf Health Status is pending status change from to critical for 19.755810602s.
INFO[143419] ip-10-0-204-253::Serf Health Status is now pending status change from to passing.
INFO[143439] ip-10-0-204-253::Serf Health Status is pending status change from to passing for 19.893641664s.
INFO[143459] ip-10-0-204-253::Serf Health Status is pending status change from to passing for 39.648083805s.
INFO[143480] ip-10-0-204-253::Serf Health Status is pending status change from to passing for 1m0.650671765s.
INFO[143501] ip-10-0-204-253::Serf Health Status is pending status change from to passing for 1m21.307562045s.
INFO[143521] ip-10-0-204-253::Serf Health Status has changed status from to passing.
INFO[143543] Getting profile for node: ip-10-0-204-253 service: check: serfHealth
I've the same behaviour but I think it come from the change-threshold
https://github.com/AcalephStorage/consul-alerts#health-checks.
Indeed, your node is critical for only 20 sec so, critical state never reach threshold. But on passing, you reach 60 sec so notification is send.