robusta icon indicating copy to clipboard operation
robusta copied to clipboard

Rate limit feature for on_kubernetes_warning_event

Open otherguy opened this issue 2 years ago • 5 comments

We sometimes have GCP acting up and causing false positives in our alerts.Currently this is our configuration, but sometimes we still get false positives.

- triggers:
  - on_kubernetes_warning_event_create:
      include: [ "FailedGetPodsMetric", "FailedGetExternalMetric" ]
      exclude: [ "googleapi: Error 503", "googleapi: Error 429", "No recommendation" ]

Some time ago, @Avi-Robusta built a custom image to test a few additional options to the trigger:

  • rate_limit
  • min_count
  • delay_s

But I don't see those in the documentation, so they probably never made it into a release.It would be great if those could be included and e.g. have the warning event only trigger if the Kubernetes Warning event has been firing for a certain length of time, or has fired a certain amount of times.

otherguy avatar Nov 17 '23 10:11 otherguy

Hi @otherguy, We have the rate_limit param included today.

The branch adding the other changes is found here but it's been delayed in merging for now. We plan to get back to it, but it will take a little time due to backlog on our end.

aantn avatar Nov 25 '23 15:11 aantn

@pavangudiwada can you update the docs for on_kubernetes_warning_event_create (and any related triggers) to add rate_limit?

aantn avatar Nov 25 '23 15:11 aantn

@aantn @pavangudiwada any update on this? 😃 We're getting a lot of false positives with GKE prometheus. They go away after a few seconds/minutes and we get pointlessly alerted. I've excluded this alert from our pages, but it would be great to have OpsGenie pages for real alerts.

CleanShot 2023-12-12 at 12 39 06@2x

otherguy avatar Dec 12 '23 11:12 otherguy

@otherguy reaching out to you about this on Slack.

aantn avatar Dec 17 '23 08:12 aantn

I was wondering if there is any news about that :)

otherguy avatar Feb 09 '24 10:02 otherguy