pyrra Enable alert severity overrides

This PR extends the Alerting configuration to enable alert severity label value overrides.

As Pyrra generates alerts of two severities, critical and warning, there are cases where users might want to change these values to something else due to alert routing/pager config in different envs (for eg, not critical but high for stage).

This allows setting highSeverity and lowSeverity for alerts, which default to critical and warning.

Wdyt? 🙂

Feb 19 '23 08:02 saswatamcode

Very interesting and happy to add something along those lines. Currently, I'm not sure about the wording and will give this another look.

Feb 20 '23 12:02 metalmatze

Looking at this again, I wonder should we at least make all four alerts configurable in terms of setting custom severity label values? I can see that make sense overall.

warning and critical have been decided on by Prometheus team as default severity label value and hence Pyrra uses the same by default.

Feb 25 '23 13:02 metalmatze

Does something like the config below seem better?

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: custom-severity
  namespace: monitoring
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  target: 99
  window: 2w
  indicator:
    ratio:
      errors:
        metric: prometheus_operator_reconcile_errors_total
      total:
        metric: prometheus_operator_reconcile_operations_total
  alerting:
    absentSeverity: high
    windowBurnRateSeverity: [high, high, low, low]

This way user can override the SLOMetricAbsent alerts as well as the four window-based alerts. Not specifying would mean falling back to default and the windowBurnRateSeverity if specified, must always be of length 4, to avoid confusion.

warning and critical have been decided on by Prometheus team as default severity label value and hence Pyrra uses the same by default.

Yup these are great defaults and work well. This option just addresses the cases which deviate from this. 🙂

Feb 25 '23 14:02 saswatamcode

@metalmatze should I update this PR with the config above then?

Mar 02 '23 11:03 saswatamcode

Sorry for dropping the ball on this one...

How about something similar to this?

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: custom-severity
  namespace: monitoring
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  target: "99"
  window: 2w
  indicator:
    ratio:
      errors:
        metric: prometheus_operator_reconcile_errors_total
      total:
        metric: prometheus_operator_reconcile_operations_total
  alerting:
    disabled: false
    name: ErrorBudgetBurn
    severities:
      absent: high
      level1: critical
      level2: error
      level3: warning
      level4: info

I think putting them into a map of some sorts makes it more ergonomic to configure? Then again, I'm not sure about the level* key... There the array might be better.

Apr 21 '23 10:04 metalmatze

pyrra pyrra copied to clipboard

Enable alert severity overrides

pyrra
pyrra copied to clipboard