amazon-managed-grafana-roadmap icon indicating copy to clipboard operation
amazon-managed-grafana-roadmap copied to clipboard

AWS managed grafana fires a notification 3 times

Open trinh-tien-dat opened this issue 1 year ago • 22 comments

Hi Team, I'm using AWS managed Grafana service.

When I use Notification policies to route my alerts to difference Contact points, when an alert rule fires, I get 3 similar notifications at the same time.

I ran into some search and it seems it is a bug of AWS Managed Grafana service.

is there anyone can fix it by yourself? or any AWS Guys in this thread can tell me why it happens and when you can fix it?

Thank you!

trinh-tien-dat avatar Jul 09 '23 03:07 trinh-tien-dat

Yes Same for us as well, seems like a bug in AMG

rpractice avatar Jul 18 '23 13:07 rpractice

@rpractice Hi, I'm using prometheus alertmanager as an external alertmanger instead of Grafana default alertmanager to take care of the duplicating, I will let you know the result in couple days.

trinh-tien-dat avatar Jul 19 '23 10:07 trinh-tien-dat

@trinh-tien-dat , Did you manage to fix this issue using Prometheus?

rpractice avatar Aug 01 '23 10:08 rpractice

yes, it worked, what I have done is that I installed prometheus-alertmanager on an EC2 instance, then I config it to route incoming alerts to AWS SNS, prometheus-alertmanager groups the alerts and de-duplicated them. Then, I go to AWS Grafana service --> contact point --> alertmanager --> I set up the prometheus-alertmanager url and an account to authen.

It worked well, we can use this when we wait for them to fix it.

trinh-tien-dat avatar Aug 01 '23 10:08 trinh-tien-dat

Is this the behavior that's documented in the second bullet point here? https://docs.aws.amazon.com/grafana/latest/userguide/v9-alerts.html#v9-alert-limitations

brc avatar Aug 03 '23 01:08 brc

@invsblduck Hi, I'm not sure if it relates to my issue.

trinh-tien-dat avatar Aug 03 '23 02:08 trinh-tien-dat

is there an ETA for this to be fixed ?

chr2che avatar Aug 08 '23 10:08 chr2che

@trinh-tien-dat Apologies that I phrased it as a question for you. This behavior is documented as a limitation at the link I shared:

Alert rules defined in Grafana, rather than in Prometheus, send multiple notifications to your contact point.

This is also a duplicate of #47

brc avatar Aug 09 '23 06:08 brc

There is an interesting thing, on AWS Grafana service, when you define some rules and config only 1 contact point (It's also the default contact point) for the rules, then there is no duplication notifications, it works perfect (I have 1 contact point AWS SNS). Then, you have more contact points (>= 2), at this time you have to define Notification policies to route the alerts to right contact point as you wish, there the duplication issue occurs, I tried many times from my AWS grafana service, and I could say: 1 contact point for all rules: worked multiple contact points, each contact point for some of the rules: you will have the issue. Thanks! Dat.

trinh-tien-dat avatar Aug 09 '23 08:08 trinh-tien-dat

This seems like an unfortunate issue that AMG has and AMG should solve it. While they do call it out its expected to me that is just a known bug. AMG shouldn't limit you from using the new alerting feature if you want to have your alerts in grafana vs Prometheus, esp if you are using other data sources that are not Prometheus you may want to have grafana alerts. This really limits you from using the new alerting features in grafana that are way more powerful then the legacy ones

lorelei-rupp-imprivata avatar Oct 03 '23 15:10 lorelei-rupp-imprivata

f.w.i.w. we ended up with writing our own lambda (for slack alerts) that does deduplication. alerts that are coming in triplets are not just ridiculous, but also encourage people stopping paying attention to them.

https://github.com/flashbots/prometheus-sns-lambda-slack

0x416e746f6e avatar Feb 03 '24 09:02 0x416e746f6e

+1 this is PITA.

webertrlz avatar Feb 13 '24 10:02 webertrlz

The Bug we are talking about is still there even withManaged Grafana 10.4 new version. @mhausenblas are you kindly able to tell us when it will be solved?

MaGaudin avatar Jun 10 '24 12:06 MaGaudin

@MaGaudin The bug fix is in progress and will be available in Managed Grafana version 10.4. See my response here

VermaPriyanka avatar Jun 10 '24 15:06 VermaPriyanka

Hi @VermaPriyanka. We updated to Grafana 10.4 but the bug is still there.

MaGaudin avatar Jun 10 '24 15:06 MaGaudin

Like I mentioned the work is still in progress. Will post an update here, once its rolled out.

VermaPriyanka avatar Jun 10 '24 16:06 VermaPriyanka

Ok Thanks for clarifying. Hope this will come asap being a vary nasty bug that can bring users to change product. In the meantime, have you any advise on some workarounds? What Can we do to mitigate this problem?

MaGaudin avatar Jun 10 '24 16:06 MaGaudin

There is an interesting thing, on AWS Grafana service, when you define some rules and config only 1 contact point (It's also the default contact point) for the rules, then there is no duplication notifications, it works perfect (I have 1 contact point AWS SNS). Then, you have more contact points (>= 2), at this time you have to define Notification policies to route the alerts to right contact point as you wish, there the duplication issue occurs, I tried many times from my AWS grafana service, and I could say: 1 contact point for all rules: worked multiple contact points, each contact point for some of the rules: you will have the issue. Thanks! Dat.

Unfortunately, With Grafana 10.4, this solution does not seem to work anymore. Has anyone else noticed the same?

MaGaudin avatar Jun 10 '24 16:06 MaGaudin

Ok Thanks for clarifying. Hope this will come asap being a vary nasty bug that can bring users to change product. In the meantime, have you any advise on some workarounds? What Can we do to mitigate this problem?

We have been waiting just about a year for a fix... and none yet

lorelei-rupp-imprivata avatar Jun 10 '24 16:06 lorelei-rupp-imprivata

There should be an update to Grafana 10.4 after september 14, 2024, that will resolve the Duplicate alert issue. This update is only available for Amazon Managed Grafana version 10.4 workspaces. If you are running Grafana version 8.4 or 9.4, you must upgrade your workspace to Grafana version 10.4 to receive this update.

tanjilbhuiyan avatar Sep 06 '24 07:09 tanjilbhuiyan

There should be an update to Grafana 10.4 after september 14, 2024, that will resolve the Duplicate alert issue. This update is only available for Amazon Managed Grafana version 10.4 workspaces. If you are running Grafana version 8.4 or 9.4, you must upgrade your workspace to Grafana version 10.4 to receive this update.

That's a great new, I'll try this as soon as possible this month.

trinh-tien-dat avatar Sep 09 '24 13:09 trinh-tien-dat

Has anyone tried if it is working in the new version properly?

rpractice avatar Sep 17 '24 09:09 rpractice

Yes, it is working now. Starting on 21st of september our 10.4 installation has started to fire only one notification per alert. At last!

enriquegaldu avatar Sep 24 '24 07:09 enriquegaldu

Yes, it is working now. Starting on 21st of september our 10.4 installation has started to fire only one notification per alert. At last!

@VermaPriyanka , Can you please confirm if the new version has a fix for the duplicate notification, so we can update to the latest version.

rpractice avatar Sep 24 '24 14:09 rpractice

Yes, it is working now. Starting on 21st of september our 10.4 installation has started to fire only one notification per alert. At last!

@VermaPriyanka , Can you please confirm if the new version has a fix for the duplicate notification, so we can update to the latest version.

Please be careful, I did upgrade from 9.4 to 10.4, and I got a lot of issues, Now I create a new Workspace 10.4 and migrate things from the old to the new one manually.

My test was: I created a new workspace 9.4, then use this tool to clone the current workspace to the new workspace: https://github.com/aws-observability/amazon-managed-grafana-migrator, Then I did the upgrade it from 9.4 to 10.4.

trinh-tien-dat avatar Sep 24 '24 15:09 trinh-tien-dat

Thank you all for your patience. Happy to share that the update to prevent multiple alert notifications sent to your contact points from Grafana managed alert rules, is now available on all Amazon Managed Grafana workspaces running Grafana version 10.4, across all AWS regions where Amazon Managed Grafana is generally available.

If you are running Grafana version 8.4 or 9.4, you must upgrade your workspace to Grafana version 10.4 to receive this update. For instructions on how to update your workspace(s), see Amazon Managed Grafana service documentation. We recommend testing the newer version in a non-production environment before updating a production workspace.

We know this has been a frustration for many of you, and we deeply appreciate your understanding and patience. Your feedback and continued support are invaluable in helping us improve, and we're committed to ensure a smoother experience going forward.

If you have any questions or continue to experience issues, please reach out to us. Alternatively, if you prefer a more private channel, feel free to reach us via email at [email protected].

Public roadmap announcement

VermaPriyanka avatar Sep 25 '24 13:09 VermaPriyanka

Hi Everyone,

10.4 is actually fixed the problem, I upgraded from 9.4 to 10.4, and I now receive 1 notification per alert, that's great!

Thank you to AWS Grafana Team.

FYI, the steps I took to upgrade from 9.4 to 10.4:

1- Create a new workspace 10.4 2- Use amazon-managed-grafana-migrator tool to migrate between the workspaces. 3- The tool does not sync alerts (rules, contact points, and notification policies) I think that makes sense, you just need to re-create alerts on the new workspace.

All good then!

NOTE: Please consider to direct upgrade from 9.4 to 10.4, I tried and failed.

Thanks! Dat.

trinh-tien-dat avatar Sep 30 '24 13:09 trinh-tien-dat