amazon-managed-grafana-roadmap icon indicating copy to clipboard operation
amazon-managed-grafana-roadmap copied to clipboard

Notification deduplication for Unified Alerting

Open justinbwood opened this issue 1 year ago • 50 comments

Per the AWS Managed Grafana docs on migrating classic alerts to Grafana alerting, multiple notifications are sent when using Grafana-managed alerts.

I would like to see Grafana's high availability alerting enabled so that notifications are properly deduplicated, as it's a bit frustrating to receive Slack notifications in triplicate when using Unified Alerting.

Thanks!

justinbwood avatar May 11 '23 17:05 justinbwood

I also like to see this. Really annoying with these three messages per Alert... I filed an issue over at grafana, but it seems like theres something wrong with amazon managed grafana config.

https://github.com/grafana/grafana/issues/68652

atze234 avatar May 17 '23 20:05 atze234

I've also been running into this issue. Opened a support ticket w/ AWS and the result was basically reflecting the doc that was linked in this comment. It seems like really bad UX to spam out alerts like this... I'd be interested to hear what workarounds others have used; I'm in the process of migrating over to managing the alerts using an external alert manager, Prometheus AlertManager, instead. Would be nice to be able to provision the alert rules in Grafana though!

bradlet avatar May 30 '23 17:05 bradlet

As a workaround im using a Dynamodb and a Message hash in my Lambda that parses SNS. Like here:

https://gist.github.com/atze234/60dbef2991e08aba93b875c73578cf41

Also i set this in delivery_policy so that there is enough time to write to the db.

    "defaultThrottlePolicy": {
      "maxReceivesPerSecond": 1
    },

atze234 avatar Jun 02 '23 13:06 atze234

This really is needed, since the "Classic" alerting is supposedly going away soon. It makes using Slack or Pagerduty impossible when monitoring large workloads, especially since classic alerts do not allow for template variables.

RphCos avatar Jul 28 '23 17:07 RphCos

+1

brc avatar Aug 03 '23 01:08 brc

is there any ETA for this please?

chr2che avatar Aug 08 '23 10:08 chr2che

Spoke to AWS team about this today. They gave an "estimate" of Q1 2024 with possibility it might be as late as Q3 2024. According to them it's not a high priority issue for them and there are other issues they need to work on before that happens.

My biggest issue with it is that with Grafana managed service - alerting is advertised as a service feature.

I guess paying customers don't get a working feature until AWS deemed it worth fixing it...

andrzej-mega avatar Aug 23 '23 14:08 andrzej-mega

We are also experiencing this issue. This is a primary feature of the service, and it is extremely disappointing that Amazon doesn't prioritize primary features of its products. We have waited for 1.5 years for Amazon to make 9.4 available in AMG so that we could use the alerting that is part of 9.4. Alerting is the only feature of 9.4 that we needed. It was/is the biggest reason to upgrade to 9.4. Now, we might further delay upgrading until as late Q3 2024 making it more than 2.5 years.

The purpose of the above rant is to add my vote to the priority of this issue.

kevdonde avatar Nov 21 '23 16:11 kevdonde

+1

webertrlz avatar Feb 29 '24 09:02 webertrlz

@VermaPriyanka do we have any updates on this and when should we expect a fix? This is really important to us!

michael-ortiz avatar Mar 01 '24 15:03 michael-ortiz

FYI @VermaPriyanka this is a showstopper for us. We considered various solutions for providing an observability service to our engineering teams and settled on Managed Grafana expecting it to Just Work. Now after a significant investment of resources to get set up and put processes in place, we've hit this bug which renders the service unfit for use. Alerting is core functionality and we cannot expect other teams to accept all of their alerts appearing 3x in Slack!

We would really appreciate a fix for this ASAP or at the very least an ETA on a fix and a standard workaround until the fix arrives.

amorphic avatar Mar 19 '24 02:03 amorphic

workaround while we're waiting https://github.com/flashbots/prometheus-sns-lambda-slack

sukoneck avatar Mar 19 '24 03:03 sukoneck

Thank you all for the patience and for sharing workarounds. We understand that this is an important issue to solve and are working towards the same.

VermaPriyanka avatar Mar 19 '24 15:03 VermaPriyanka

+1

avpjanm avatar Mar 27 '24 07:03 avpjanm

AWS released Grafana 10.4 yesterday, and it's still an issue.

Strangely, this was their response to the alerting in HA issue.

https://docs.aws.amazon.com/grafana/latest/userguide/v10-alerting-explore-high-availability.html

image

magnowest avatar May 16 '24 12:05 magnowest

AWS released Grafana 10.4 yesterday, and it's still an issue.

Strangely, this was their response to the alerting in HA issue.

https://docs.aws.amazon.com/grafana/latest/userguide/v10-alerting-explore-high-availability.html

image

Yeah this is the WORST bug, I am not even sure how they can release with this issue, its been a year now, we are still stuck on the old legacy alerts because of this. That documentation almost suggests they won't fix this and its working as they designed it

lorelei-rupp-imprivata avatar May 16 '24 12:05 lorelei-rupp-imprivata

Thank you for voicing this concern. We are working towards a fix for the duplicate notifications issue in version 10. The description here explains the current workings of Grafana alerting, which implies rules are evaluated per HA instance. We are working towards solving this in 2 steps - focusing on solving the duplicate notifications first and to eliminate duplicate evaluations in the long term. We understand this has been a long wait, and are working towards releasing a fix soon.

VermaPriyanka avatar May 16 '24 14:05 VermaPriyanka

Facing the same issue. Do you have any workarounds for slack?

ff-pjha avatar May 17 '24 12:05 ff-pjha

Thank you for voicing this concern. We are working towards a fix for the duplicate notifications issue in version 10. The description here explains the current workings of Grafana alerting, which implies rules are evaluated per HA instance. We are working towards solving this in 2 steps - focusing on solving the duplicate notifications first and to eliminate duplicate evaluations in the long term. We understand this has been a long wait, and are working towards releasing a fix soon.

How fast can we get an fix for this, we are currently setting up alerting and its a real pain to receive all alerts 3x...

Diondk avatar Jun 11 '24 12:06 Diondk

Thank you for voicing this concern. We are working towards a fix for the duplicate notifications issue in version 10. The description here explains the current workings of Grafana alerting, which implies rules are evaluated per HA instance. We are working towards solving this in 2 steps - focusing on solving the duplicate notifications first and to eliminate duplicate evaluations in the long term. We understand this has been a long wait, and are working towards releasing a fix soon.

any updates on this nasty ,,feature"?

ursuciprian avatar Jun 27 '24 13:06 ursuciprian

We are also facing the same issue and would really appreciate on how and when this will be fixed by aws. Do you have any fix ETA on this @VermaPriyanka ? when is the fix supposed to be released for managed grafana? I am currently on Grafana v10.4.1 and still see this issue on aws managed grafana.

bguruprasad avatar Jul 24 '24 09:07 bguruprasad

Hi @VermaPriyanka, Any update on this issue?

flashguerdon avatar Aug 05 '24 05:08 flashguerdon

This is shipping soon on Managed Grafana v10.4 workspaces. Folks who have implemented workarounds to avoid the multiple notifications, do you see any concern as this fix is shipped - any breaking experiences or impact to your alerting flow?

VermaPriyanka avatar Aug 05 '24 14:08 VermaPriyanka

Will you be patching 10.4 in place? Are you releasing a new minor patch to 10.4? The above statement is slightly confusing because 10.4 has already shipped.

kevdonde avatar Aug 06 '24 15:08 kevdonde

Hi @VermaPriyanka , we are about to implement such a workaraound (detriplication on a FIFO-SQS-SNS-basis or with Prometheus). If this feature is shipping soon, it might not be worth it. So can you specify the "soon"-part of your post (and also @kevdonde 's question regarding the versioning)? Thanks in advance,

ingMor avatar Aug 06 '24 15:08 ingMor

@kevdonde @ingMor It will be in place for all 10.4 workspaces - new, existing or upgraded. If you have additional/more specific questions, you can send them via mail to [email protected].

VermaPriyanka avatar Aug 06 '24 20:08 VermaPriyanka

This is shipping soon on Managed Grafana v10.4 workspaces. Folks who have implemented workarounds to avoid the multiple notifications, do you see any concern as this fix is shipped - any breaking experiences or impact to your alerting flow?

That's great to hear. We have been avoiding creating alerts on Managed Grafana and creating on Prometheus or Cloudwatch, but our idea is to centralize all on Grafana.

Looking forward for the release!

william-kurosawa avatar Aug 08 '24 00:08 william-kurosawa

@VermaPriyanka when is this fix coming? Currently, Alerting is unusable due to spam of multiple alerts

Dragotic avatar Aug 22 '24 14:08 Dragotic

Hi @VermaPriyanka ,

Could you provide an update on when the fix will be released? Any ETA or additional details would be appreciated.

Thanks!

MehediAxe avatar Aug 22 '24 15:08 MehediAxe

I'm also standing by for the workspace fix.

webertrlz avatar Aug 29 '24 14:08 webertrlz