prometheus icon indicating copy to clipboard operation
prometheus copied to clipboard

Add support for defining series and alert limits individually in rule groups

Open vjsamuel opened this issue 1 year ago • 2 comments

This PR attempts to define rule group parameters for series and alert limits so that individual limits can be applied to recording and alerting rules respectively.

Alert managers typically suffer when too many alerts are emitted. Rule groups can have a mix of both recording and alerting rules and it is beneficial to end users to be able to define each limit separately.

This PR takes the global limit as-is if neither is defined. If either alert or series limits are defined, then they are honored first.

Ex:

groups:
- name: example
  rules:
  - alert: InstanceDown
    expr: count(prometheus_http_requests_total) by (handler) > 0
    for: 1m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
  alert_limit: 1

vjsamuel avatar Mar 12 '24 07:03 vjsamuel

Some previous discussion on CNCF Slack: https://cloud-native.slack.com/archives/C167KFM6C/p1710179010255979

bboreham avatar Mar 26 '24 11:03 bboreham

(Hello from the bug scrub meeting)

Thanks. We checked it again, but it seems there might be ideas to improve a bit (e.g. have only alert limit and use limit as series limit?), plus it would be good to double check if we can't fix it with global alert limit (per group) setting in Prometheus. Let's continue to chat on the mentioned Slack thread!

Another idea would be

limits:
  rulesSeries: 1
  alerts: 1

.. and remove just limits in prom 3.0 or so

bwplotka avatar Mar 26 '24 12:03 bwplotka

Hello from the bug scrub!

Looks like there's little progress here, although the idea is still valid. Related discussion was closed as well https://github.com/prometheus/prometheus/issues/12646

Thanks!

bwplotka avatar Feb 11 '25 11:02 bwplotka