public-roadmap icon indicating copy to clipboard operation
public-roadmap copied to clipboard

Amendment in Alert Frequency when a check is failed

Open aliasgerkw opened this issue 4 years ago • 8 comments

💡 For general support requests and bug reports, please go to checklyhq.com/support

Is your feature request related to a problem? Please describe. Currently if a check has a frequency of 5 min and something went wrong at 7:35PM, I have to wait till 7:40PM even if the downtime was for few seconds. This will give incorrect values for 30 days ratio and other matrix.

Describe the solution you'd like Was thinking that if a check fails in the background there has to be a way to try the check every seconds or 10 sec (configurable) and once a success comes the check should throw PASS alert and timer should reset to that time. Lets say 7:35:50 PM was FAILURE event and retry config is 5 sec and lets say api was down for 35 seconds then 7:40:25 PM should have a PASS alert in and next trigger after 5 min should be at 7:45:25 rather then at 7:40:50 PM.

Describe alternatives you've considered No alternative as of now, we are considering 5 whole min as downtime as approx value.

Additional context Add any other context or screenshots about the feature request here.

aliasgerkw avatar Oct 28 '20 14:10 aliasgerkw

Hey @aliasgerkw this is very obvious challenge we are aware of. What you are proposing is what I / we call "smart retries". We already have retries in place that double check on failure. This is in the seconds range. After one retry, we stop and alert the FAILURE.

Your suggestion is great and we have though this through. There are many fantastically interesting issues you run into when you would implement such a scheme, so for now we opt for "clear and simple to understand", but I expect that we will expand our retry / double check feature in the future.

tnolet avatar Oct 29 '20 21:10 tnolet

Correct @tnolet We are making use of smart retries and that is helping us in false alarms. I just want a more retry configuration to get exact downtime matrix.

aliasgerkw avatar Oct 30 '20 05:10 aliasgerkw

@tnolet Want to add one more thing here. Consider if a check is failed/degraded with location Mumbai the subsequent check should get triggered with Mumbai location to make sure its not an issue with particular geography. Possible?

aliasgerkw avatar Nov 12 '20 11:11 aliasgerkw

I want to +1 this. We have a test we only run every 60 minutes, so this issue is even bigger in that case. Even if the problem is fixed within minutes, it takes 60 minutes before the test is back up again. It would be really nice to be able to configure a different test frequency when the test is down, than when the test is up.

tobiasf avatar Nov 08 '21 11:11 tobiasf

+1 on this as well one simple way to allow for this to be setup would be to expose the check id in the arguments sent to the teardown script. Then you could code a teardown script that hits the checkly API and alters the frequency.

another way that you CAN do this right now is with a webhook alert. same idea, the webhook would hit the api and update frequency. that's just not as nice as you need some OTHER service ready to respond having this all in checkly would be ideal

infn8 avatar Dec 09 '21 19:12 infn8

Hi all, we created a "tracking" ticket to work on this problem. We will ship a solution in this area, just no super hard date yet, but it's on our public roadmap. https://github.com/checkly/public-roadmap/issues/208

tnolet avatar Apr 26 '22 10:04 tnolet

@tnolet Want to add one more thing here. Consider if a check is failed/degraded with location Mumbai the subsequent check should get triggered with Mumbai location to make sure its not an issue with particular geography. Possible?

@tnolet is this doable?

aliasgerkw avatar Apr 26 '22 10:04 aliasgerkw

@aliasgerkw that is absolutely a case we will consider. Let me add that to the tracking ticket. We have not specced this out yet, so this is valuable. tnx

tnolet avatar Apr 26 '22 10:04 tnolet