otomi-core
otomi-core copied to clipboard
AlertManager receivers for Watchdog
**Is your feature request related to a problem?
Ability to define alert-manager watchdog to send heartbeat to OpsGenie, so we are notified if alert-manager stops sending alerts.
Describe the solution you'd like
We are trying to set up a watchdog (Prometheus)/heartbeat (OpsGenie) alert with the infra team who are using OpsGenie. As discussed here, this is already supported by AlertManager. I'd appreciate if these values could be added to the schema, and then added to API and Console.
One way to tackle this would be to have a "custom receivers" section in the console, which are AlertManager snippets then saved as encrypted secrets in values repo (to protect sensitive values). This would be a very flexible solution with minimal impact on schema and highly generic.
UI proposal
Yep that's exactly it
"Watchdog" is always turned on as it acts like a dead man's switch. You can see it firing in the Alertmanager UI. So adding a checkbox that only says "Enabled" is confusing. So what does that checkbox do exactly?
"Watchdog" is always turned on as it acts like a dead man's switch. You can see it firing in the Alertmanager UI. So adding a checkbox that only says "Enabled" is confusing. So what does that checkbox do exactly?
In alertmanager.gotmpl under routes
you'll see that Watchdog alerts are always pointed to null receiver. Of course the alert is still firing, but we need to be able to actually point it at a receiver or it's of no use to us. In this case an OpsGenie Heartbeat endpoint which is different to the normal OpsGenie alert receiver endpoint. OpsGenie does not possess the functionality to pull Watchdog/Heartbeat alerts from the normal alert receiver and then process them appropriately - must go to OpsGenie Heartbeat specifically.
In the case that Watchdog is disabled, we point to the existing "null" receiver, when enabled we will point it to a new receiver specifically for Watchdog alerts.
Got it. And because it is a special alert that will keep hammering, it makes sense to send it to a dead mans switch endpoint instead of the regular alerting endpoints. Otherwise ppl go crazy.
exactly
Since remote write is now enabled this is no longer needed by GU. Still good for the future though.
The custom config section for remote write exemplifies perfectly what I'm going for here: