[Feat]: add `1min_anomaly_rate` and `1min_node_anomaly_rate` to alarm events.
Problem
We need to expose anomaly rates as part of alerts.
This feature request aims to build first piece of this by adding two new fields to alarms in the agent.
Description
-
1min_anomaly_rate: the average anomaly rate overall dims involved in the alarm in the preceding 60 seconds. -
1min_node_anomaly_rate: the average overall node anomaly rate preceding 60 seconds can be taken from theanomaly_detection.anomaly_ratechart.
Importance
must have
Value proposition
- first step in adding AR%'s into alert templates etc to provide more context.
...
Proposed implementation
TBD with agent team input. Main idea is to either calculate these values as part of the health engine itself or to calculate them on state transition of alerts.
Doing this will enable [Feat]: Add anomaly rate into alerts templates#757
@MrZammler can you have a think about this one. I might follow up with you next week but just wanted to give you a heads up to start discussing if or how easy/feasible this may or may not be.
Hopefully you will tell me it is super easy and simple :)
@MrZammler what would you think about getting together a POC minimal PR at some stage in next week or two to do this:
- make
1min_node_anomaly_rateavailable as a variable. - ability to reference and add it into the
infoof an alert.
Idea being a sort of minimal POC to get going.
@ilyam8 @Ferroin @shyamvalsan as fyi - idea here being to start with as simple as possible POC we can think of.
While we're at it, can we capture this along with both
- triggeredValue
- latestValue
@MrZammler (Off topic) It might also be useful to get the node OS and node type (k8s vs container vs bare metal) in the alert info since troubleshooting steps can be customized accordingly.
(Feature request submitted --> https://github.com/netdata/netdata/issues/14923)
@shyamvalsan you can make new feature requests for that stuff :)
Draft POC PR here: https://github.com/netdata/netdata/pull/15012