sentry icon indicating copy to clipboard operation
sentry copied to clipboard

Alert on "users experiencing errors" not triggering, despite being over warning and error thresholds

Open arifken opened this issue 1 year ago • 3 comments

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

I haven't been able to reproduce this error since it happened, but I have the broken alert still in my account if that would be helpful.

The alert settings are: When: Users experiencing errors is above 10 in 5 minutes Then: Send a Slack notification to

The query defined is

count_unique(user)
(event.type:[error, default]) AND (level:fatal release.stage:adopted) over 5 minutes

Expected Result

I should be getting Slack messages whenever the warning or error thresholds are exceeded

Actual Result

i'm not getting any messages. And the alert remains in a "resolved" state even though the timechart shows the counts being over threshold

Product Area

Alerts

Link

No response

DSN

No response

Version

No response

arifken avatar Aug 03 '24 19:08 arifken

Auto-routing to @getsentry/product-owners-alerts for triage ⏲️

getsantry[bot] avatar Aug 03 '24 19:08 getsantry[bot]

I think the issue here is the time period of 5 minutes paired with the release.stage tag. release.stage is a rolling tag that's evaluated every hour from the last six hours of data. It might be that having release.stage:adopted doesn’t work with a 5 minute interval because at the time of evaluation, the release that’s experiencing errors isn’t adopted yet, so I'd recommend changing that to 1 hour.

We can do a better job here, but with regards to it showing that the alert is resolved, that's how all alerts that haven't fired yet are shown.

ceorourke avatar Aug 06 '24 19:08 ceorourke

Ah! ok. but as soon as that release is adopted, it should alert right? or is it that release.stage: adopted is always incompatible with a <1h interval regardless of how much traffic has shifted to that version?

arifken avatar Aug 10 '24 02:08 arifken

Routing to @getsentry/product-owners-releases for triage ⏲️

getsantry[bot] avatar Aug 14 '24 19:08 getsantry[bot]

Routing to @getsentry/product-owners-issues for triage ⏲️

getsantry[bot] avatar Aug 14 '24 19:08 getsantry[bot]

Routing to @getsentry/product-owners-releases for triage ⏲️

getsantry[bot] avatar Aug 14 '24 19:08 getsantry[bot]

Routing to @getsentry/product-owners-alerts for triage ⏲️

getsantry[bot] avatar Aug 14 '24 19:08 getsantry[bot]

@arifken - I checked with the team and release.stage:adopted is currently not supported as a query term for metric alerts. We are making a series of changes that would support this in the future. Thank you for filing this ticket!

rachrwang avatar Aug 14 '24 19:08 rachrwang

Routing to @getsentry/product-owners-releases for triage ⏲️

getsantry[bot] avatar Aug 14 '24 19:08 getsantry[bot]

Routing to @getsentry/product-owners-alerts for triage ⏲️

getsantry[bot] avatar Aug 14 '24 19:08 getsantry[bot]

@rachrwang Hi, do we know when its possible to include release.stage:adopted as part of metrics alert? Im trying to track our Crash free rates up to latest 3 releases. Was thinking if we can mark the latest 3 releases it can be considered as adopted or when the user count is significant.

This will ensure that we dont have to keep manually updating the filter release option for every releases.

Thanks

Lucienzera avatar May 21 '25 08:05 Lucienzera

Hi @Lucienzera,

Thanks for the feature request—would you mind making a new ticket for it?

mifu67 avatar May 21 '25 20:05 mifu67