pyrra
pyrra copied to clipboard
Enable alert severity overrides
This PR extends the Alerting configuration to enable alert severity label value overrides.
As Pyrra generates alerts of two severities, critical and warning, there are cases where users might want to change these values to something else due to alert routing/pager config in different envs (for eg, not critical but high for stage).
This allows setting highSeverity and lowSeverity for alerts, which default to critical and warning.
Wdyt? 🙂
Very interesting and happy to add something along those lines. Currently, I'm not sure about the wording and will give this another look.
Looking at this again, I wonder should we at least make all four alerts configurable in terms of setting custom severity label values? I can see that make sense overall.
warning and critical have been decided on by Prometheus team as default severity label value and hence Pyrra uses the same by default.
Does something like the config below seem better?
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: custom-severity
namespace: monitoring
labels:
prometheus: k8s
role: alert-rules
spec:
target: 99
window: 2w
indicator:
ratio:
errors:
metric: prometheus_operator_reconcile_errors_total
total:
metric: prometheus_operator_reconcile_operations_total
alerting:
absentSeverity: high
windowBurnRateSeverity: [high, high, low, low]
This way user can override the SLOMetricAbsent alerts as well as the four window-based alerts. Not specifying would mean falling back to default and the windowBurnRateSeverity if specified, must always be of length 4, to avoid confusion.
warning and critical have been decided on by Prometheus team as default severity label value and hence Pyrra uses the same by default.
Yup these are great defaults and work well. This option just addresses the cases which deviate from this. 🙂
@metalmatze should I update this PR with the config above then?
Sorry for dropping the ball on this one...
How about something similar to this?
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: custom-severity
namespace: monitoring
labels:
prometheus: k8s
role: alert-rules
spec:
target: "99"
window: 2w
indicator:
ratio:
errors:
metric: prometheus_operator_reconcile_errors_total
total:
metric: prometheus_operator_reconcile_operations_total
alerting:
disabled: false
name: ErrorBudgetBurn
severities:
absent: high
level1: critical
level2: error
level3: warning
level4: info
I think putting them into a map of some sorts makes it more ergonomic to configure? Then again, I'm not sure about the level* key... There the array might be better.