Filter alerts based on query result labels
For context: working with 100s of hosts and the alert.rules config is over 1000 lines already
A typical rule will look like
- alert: CacheResponseTime
annotations:
description: 'Response time for cache is over 4s ({{ $value }}) at {{ $labels.nodename }} {{ $labels.customer }}'
summary: 'Response time for cache is over 4s ({{ $value }}) at {{ $labels.nodename }} {{ $labels.customer }}'
expr: |
haproxy_backend_total_time_average_seconds{proxy="cache"}
* on(instance) group_left(nodename) (node_uname_info)
* on(instance) group_left(customer, environment) (pool_info)
> 4
for: 2m
labels:
severity: critical
Given your examples, I could use the "static" label severity to change the alert routing, but really I want to check the query results and filter based on that. (E.g. for certain conditions, downgrade the alert or route it to the warnings channel == avoid pagerduty or similar)
Some examples of routes the way I'd like to write them:
routes:
- match:
customer: developer1
receiver: warnings-channel
- match:
environment: dev
receiver: warnings-channel
- match:
nodename: dev-host
receiver: warnings-channel
(In our case, there's more variables than just looking for "dev")
So either the route config above would accept optional keys (that may originate from the queries) or Alertmanager could have a new step in the pipeline to filter alerts
Right now the work around are not ideal or elegant:
- sending all critical alerts to a webhook and have some logic determining what to do
- Duplicate all alerting rules, that need extra filters (the alert.rules would multiply in size very quickly)