sentry icon indicating copy to clipboard operation
sentry copied to clipboard

Use the original baseline for % based alerts

Open rachrwang opened this issue 11 months ago • 1 comments

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

Today, %-based metric alerts use the prior interval as the baseline, which means that at any moment, it'll auto-resolve even if the regression is still live, if there's no change from the prior interval.

For example, if I've set up a metric alert for 15-min interval, it'll auto-resolve 15 minutes after firing if there's no change from the prior interval. The more accurate baseline is the original interval that triggered the alert.

Expected Result

The baseline for the metric alert is based on the original data that triggered the alert.

Actual Result

Recommended fix:

  • Use % comparison to trigger the alert
  • Use the % comparison to convert to a number-based threshold
  • For example - let's say my trigger is 20% change for same time in a week. If on Monday, 3/1 at 9am, I see that my alert triggers b/c count goes from 200 to 220 week over week, on 3/8, if the count is still at 220, I don't auto-resolve....instead, we we'd check against the original value of 200

Dan's notes:

  • "resolution_threadhold is already a AlertRule level property. so we add override_resolve_threshold or something to the Incident and when we fill in a value, then we use that instead of the alert rule's threshold "

Places in code to change: https://github.com/getsentry/sentry/blob/e1e99b1ce77904a3970c2f180a72ba10724edf8a/src/sentry/incidents/subscription_processor.py#L147-L178 https://github.com/getsentry/sentry/blob/f93d3584763a8d24e038b6885b0c3ca45e59ea63/src/sentry/incidents/logic.py#L124-L133

Product Area

Alerts

Link

No response

DSN

No response

Version

No response

rachrwang avatar Mar 06 '24 22:03 rachrwang

Discussed solution with Dan - and we think it'll be a little better to use the incident trigger datetime and make these relative to when the incident triggered rather than trying to store triggered threshold values. By doing this we can support the same feature with warning / error thresholds more easily and we won't have to migrate the database or store the additional data.

The main logic can be summarized as:

so if the comparison interval is an hour, time window is 30 minutes, then if an alert is fired at 05/09 5:00pm

  • comparison window is always
    • start: 05/09 4:00pm + minute of current hour
    • end: start + time_window

saponifi3d avatar May 09 '24 23:05 saponifi3d