prometheus
prometheus copied to clipboard
Add support for defining series and alert limits individually in rule groups
This PR attempts to define rule group parameters for series and alert limits so that individual limits can be applied to recording and alerting rules respectively.
Alert managers typically suffer when too many alerts are emitted. Rule groups can have a mix of both recording and alerting rules and it is beneficial to end users to be able to define each limit separately.
This PR takes the global limit as-is if neither is defined. If either alert or series limits are defined, then they are honored first.
Ex:
groups:
- name: example
rules:
- alert: InstanceDown
expr: count(prometheus_http_requests_total) by (handler) > 0
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
alert_limit: 1
Some previous discussion on CNCF Slack: https://cloud-native.slack.com/archives/C167KFM6C/p1710179010255979
(Hello from the bug scrub meeting)
Thanks. We checked it again, but it seems there might be ideas to improve a bit (e.g. have only alert limit and use limit as series limit?), plus it would be good to double check if we can't fix it with global alert limit (per group) setting in Prometheus. Let's continue to chat on the mentioned Slack thread!
Another idea would be
limits:
rulesSeries: 1
alerts: 1
.. and remove just limits in prom 3.0 or so
Hello from the bug scrub!
Looks like there's little progress here, although the idea is still valid. Related discussion was closed as well https://github.com/prometheus/prometheus/issues/12646
Thanks!