sentry
sentry copied to clipboard
Slack notification delay after new issue arrived to sentry
Self-Hosted Version
24.4.1
CPU Architecture
x86_64
Docker Version
20.10.24
Docker Compose Version
2.23.3
Steps to Reproduce
- Create alert to send slack notification when new issue is created.
- Send issue via sentry-cli.
- Sentry gets an issue but not mark it as new issue for 5 minutes.
- After 5 minutes, it gets mark new issue and send issue to slack.
Expected Result
- Create alert to send slack notification when new issue is created.
- Send issue via sentry-cli.
- Sentry gets an issue, mark it as new issue and send it to slack.
Actual Result
It is regression of the issue: https://github.com/getsentry/self-hosted/issues/2034 The issue showed up after upgrade from 24.3.0 -> 24.4.0 with continue to 24.4.1
I also find the same reason where kafka has LAG on post-process events:
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
post-process-forwarder generic-events 0 478291 478291 0 rdkafka-b5bbee3d-650c-465f-aeb5-b1b2b78eb9a9 /172.18.0.43 rdkafka
post-process-forwarder events 0 9288607 9288608 1 rdkafka-a7f67d85-627b-4fae-ba46-c6e03a7bb3d4 /172.18.0.51 rdkafka
post-process-forwarder transactions 0 186570290 186570290 0 rdkafka-482098cd-d1ab-46fd-8ee6-b9ae3dcf3aea /172.18.0.42 rdkafka
Event ID
No response
It is expected for there to be some sort of lag between receiving an event and sending an event on slack. It is unlikely that this process will be in real time. I have a few questions:
- Was the slack notification sent in real time before?
- Is the slack notification sent only when a new event is ingested after the issue sent via sentry-cli?
We have 2 sentry instances and second has version 24.2.0 and this one is working without any issue. New issue sent to sentry is immediately marked as new which is trigger the alert and sent issue to slack.
- Yes, before version 24.4.0 it was working as expected.
- Slack notify should be sent once new issue is created in sentry. Unfortunately, every issue created in sentry has mark ongoing so it is not sent to slack. Once ongoing mark switched to new then slack shows notify.
Got it, you are saying that new issues that come into Sentry appear as Ongoing instead of New? Then, they're not triggering alert rules.
Assigning to @getsentry/support for routing ⏲️
Routing to @getsentry/product-owners-alerts for triage ⏲️
Routing to @getsentry/product-owners-issues for triage ⏲️
do we have any update?
What does the alert rule being triggered look like? What do the alert settings look like for that project?
The default delay is 5 minutes as we attempt to digest the messages into a single message, this is less important for slack but does cut down on noise for email.
Hello @scttcper our rule look like
what does the issue alert look like?
You can see that new issue has right time and showed up ok but it is ongoing.
It is marked as NEW after 5 minutes.
I have not been able to reproduce this issue, are you able to share the alert itself (e.g. the when/if conditions and actions like sending to a slack channel)?
@leeandher I think the main thing here is that when issues come in for the first time, they're marked as Ongoing
for some reason, which is only changed to New
when new events are ingested by kafka. For self-hosted specifically, it's a larger issue since events are ingested at a far lower rate than in SaaS.
Thats true. When the issue come in then it is marked as Ongoing
. Ongoing
status will be changed once another issue come in.
I send issue via sentry-cli
to show you what is going on:
- Issue come in and it is marked as
Ongoing
.
I send new issue:
- Both are marked as
Ongoing
.
After a few seconds:
- Oldest issue change status from
Ongoing
toNew
-> sentry will send issue to slack.
Issue that come in to sentry should be immediately marked as New
which will trigger alert and send message to slack.
Thanks for the additional context! I'll be taking a look at this again tomorrow morning to reproduce it and look into why they're being incorrectly categorized.
Alright I can confirm where the issue is coming from (thanks @snigdhas) but we'll have to be a little precise with how we fix it as there are a few downstream affects that will cause issues if we simply update this line.
We're going to factor this issue into a larger scope fix to address some of these status inconsistencies, but I'm not able to provide a concrete timeline on when to expect a fix. Thanks again for all the context and for reporting this!
Hello, do we have an update? Thanks
Hi @simPod, thanks for checking in. We're actively working on the fix for this as part of #75454. The fix should be rolled out by the end of the month.