pyrra icon indicating copy to clipboard operation
pyrra copied to clipboard

Enable alert severity overrides

Open saswatamcode opened this issue 2 years ago • 7 comments

This PR extends the Alerting configuration to enable alert severity label value overrides.

As Pyrra generates alerts of two severities, critical and warning, there are cases where users might want to change these values to something else due to alert routing/pager config in different envs (for eg, not critical but high for stage).

This allows setting highSeverity and lowSeverity for alerts, which default to critical and warning.

Wdyt? 🙂

saswatamcode avatar Feb 19 '23 08:02 saswatamcode

Very interesting and happy to add something along those lines. Currently, I'm not sure about the wording and will give this another look.

metalmatze avatar Feb 20 '23 12:02 metalmatze

Looking at this again, I wonder should we at least make all four alerts configurable in terms of setting custom severity label values? I can see that make sense overall.

warning and critical have been decided on by Prometheus team as default severity label value and hence Pyrra uses the same by default.

metalmatze avatar Feb 25 '23 13:02 metalmatze

Does something like the config below seem better?

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: custom-severity
  namespace: monitoring
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  target: 99
  window: 2w
  indicator:
    ratio:
      errors:
        metric: prometheus_operator_reconcile_errors_total
      total:
        metric: prometheus_operator_reconcile_operations_total
  alerting:
    absentSeverity: high
    windowBurnRateSeverity: [high, high, low, low]

This way user can override the SLOMetricAbsent alerts as well as the four window-based alerts. Not specifying would mean falling back to default and the windowBurnRateSeverity if specified, must always be of length 4, to avoid confusion.

warning and critical have been decided on by Prometheus team as default severity label value and hence Pyrra uses the same by default.

Yup these are great defaults and work well. This option just addresses the cases which deviate from this. 🙂

saswatamcode avatar Feb 25 '23 14:02 saswatamcode

@metalmatze should I update this PR with the config above then?

saswatamcode avatar Mar 02 '23 11:03 saswatamcode

Sorry for dropping the ball on this one...

How about something similar to this?

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: custom-severity
  namespace: monitoring
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  target: "99"
  window: 2w
  indicator:
    ratio:
      errors:
        metric: prometheus_operator_reconcile_errors_total
      total:
        metric: prometheus_operator_reconcile_operations_total
  alerting:
    disabled: false
    name: ErrorBudgetBurn
    severities:
      absent: high
      level1: critical
      level2: error
      level3: warning
      level4: info

I think putting them into a map of some sorts makes it more ergonomic to configure? Then again, I'm not sure about the level* key... There the array might be better.

metalmatze avatar Apr 21 '23 10:04 metalmatze