otomi-core icon indicating copy to clipboard operation
otomi-core copied to clipboard

AlertManager receivers for Watchdog

Open staticvoid255 opened this issue 2 years ago • 8 comments

**Is your feature request related to a problem?

Ability to define alert-manager watchdog to send heartbeat to OpsGenie, so we are notified if alert-manager stops sending alerts.

Describe the solution you'd like

We are trying to set up a watchdog (Prometheus)/heartbeat (OpsGenie) alert with the infra team who are using OpsGenie. As discussed here, this is already supported by AlertManager. I'd appreciate if these values could be added to the schema, and then added to API and Console.

staticvoid255 avatar Aug 16 '22 12:08 staticvoid255

One way to tackle this would be to have a "custom receivers" section in the console, which are AlertManager snippets then saved as encrypted secrets in values repo (to protect sensitive values). This would be a very flexible solution with minimal impact on schema and highly generic.

staticvoid255 avatar Aug 17 '22 09:08 staticvoid255

UI proposal image.png

j-zimnowoda avatar Aug 17 '22 12:08 j-zimnowoda

Yep that's exactly it

staticvoid255 avatar Aug 17 '22 13:08 staticvoid255

"Watchdog" is always turned on as it acts like a dead man's switch. You can see it firing in the Alertmanager UI. So adding a checkbox that only says "Enabled" is confusing. So what does that checkbox do exactly?

Morriz avatar Aug 17 '22 16:08 Morriz

"Watchdog" is always turned on as it acts like a dead man's switch. You can see it firing in the Alertmanager UI. So adding a checkbox that only says "Enabled" is confusing. So what does that checkbox do exactly?

In alertmanager.gotmpl under routes you'll see that Watchdog alerts are always pointed to null receiver. Of course the alert is still firing, but we need to be able to actually point it at a receiver or it's of no use to us. In this case an OpsGenie Heartbeat endpoint which is different to the normal OpsGenie alert receiver endpoint. OpsGenie does not possess the functionality to pull Watchdog/Heartbeat alerts from the normal alert receiver and then process them appropriately - must go to OpsGenie Heartbeat specifically.

In the case that Watchdog is disabled, we point to the existing "null" receiver, when enabled we will point it to a new receiver specifically for Watchdog alerts.

staticvoid255 avatar Aug 17 '22 16:08 staticvoid255

Got it. And because it is a special alert that will keep hammering, it makes sense to send it to a dead mans switch endpoint instead of the regular alerting endpoints. Otherwise ppl go crazy.

Morriz avatar Aug 17 '22 20:08 Morriz

exactly

j-zimnowoda avatar Aug 18 '22 08:08 j-zimnowoda

Since remote write is now enabled this is no longer needed by GU. Still good for the future though.

The custom config section for remote write exemplifies perfectly what I'm going for here:Screenshot 2022-12-12 at 18.05.42.png

staticvoid255 avatar Dec 12 '22 17:12 staticvoid255