public-roadmap Alert Enhancement: x failures in y minutes

Alert Enhancement: x failures in y minutes

Open rmsral opened this issue 3 years ago • 3 comments

Just wondering if you would consider implementing an alert rule for "x failures in y minutes"? My reason here is that our applications are load balanced between web servers, so it might fail once on a bad server, but the next check is OK because it hits a different server.

We'd like to see if theres the option to have a "X fails in Y minutes" so that we can filter out the random failures (Alert fatigue), and only get alerted when there's frequent failures.

Thanks!

Apr 26 '21 23:04 rmsral

@rmsral thanks for contributing. If I understand you correctly, I think we already have that feature. Have a look at your alert setting.

Or let me know if this is not sufficient.

Apr 27 '21 14:04 tnolet

Good morning Tim. Unfortunately the check you speak of resets the count if the alert returns a healthy result (For instance, if 1 server is bad inside a farm of servers, it might hit a bad server 1 every 3 times).

I'm hoping for a feature that returns say, 5 failures within a 5 minute period, even if there's some healthy results in between (correlating with some healthy servers vs 1 bad server in a load balanced environment).

Hope that makes sense!

Apr 28 '21 01:04 rmsral

Hi @rmsral I understand the request now. At this point we do not have this feature — as you noticed — but I think the request is fair. In my mind, this feature would trigger some form of degradation, like we have right now for API checks when things are slow, but not broken. This case is similar: the service is degraded or "flapping" to use the good old Nagios term.

I cannot promise we will have this feature soon, but I want to take a closer look at it to determine of we have the basic data (which I think we do) to trigger such a state.

May 03 '21 09:05 tnolet

public-roadmap public-roadmap copied to clipboard

Alert Enhancement: x failures in y minutes

public-roadmap
public-roadmap copied to clipboard