Rate limit feature for on_kubernetes_warning_event
We sometimes have GCP acting up and causing false positives in our alerts.Currently this is our configuration, but sometimes we still get false positives.
- triggers:
- on_kubernetes_warning_event_create:
include: [ "FailedGetPodsMetric", "FailedGetExternalMetric" ]
exclude: [ "googleapi: Error 503", "googleapi: Error 429", "No recommendation" ]
Some time ago, @Avi-Robusta built a custom image to test a few additional options to the trigger:
rate_limitmin_countdelay_s
But I don't see those in the documentation, so they probably never made it into a release.It would be great if those could be included and e.g. have the warning event only trigger if the Kubernetes Warning event has been firing for a certain length of time, or has fired a certain amount of times.
Hi @otherguy,
We have the rate_limit param included today.
The branch adding the other changes is found here but it's been delayed in merging for now. We plan to get back to it, but it will take a little time due to backlog on our end.
@pavangudiwada can you update the docs for on_kubernetes_warning_event_create (and any related triggers) to add rate_limit?
@aantn @pavangudiwada any update on this? 😃 We're getting a lot of false positives with GKE prometheus. They go away after a few seconds/minutes and we get pointlessly alerted. I've excluded this alert from our pages, but it would be great to have OpsGenie pages for real alerts.
@otherguy reaching out to you about this on Slack.
I was wondering if there is any news about that :)