Filter alerts based on query result labels

Open Lusitaniae opened this issue 3 years ago • 0 comments

For context: working with 100s of hosts and the alert.rules config is over 1000 lines already

A typical rule will look like

    - alert: CacheResponseTime
      annotations:
        description: 'Response time for  cache is over 4s ({{ $value }}) at {{ $labels.nodename }} {{ $labels.customer }}'
        summary: 'Response time for  cache is over 4s ({{ $value }}) at {{ $labels.nodename }} {{ $labels.customer }}'
      expr: |
        haproxy_backend_total_time_average_seconds{proxy="cache"}
         * on(instance) group_left(nodename) (node_uname_info)
         * on(instance) group_left(customer, environment) (pool_info)
         > 4
      for: 2m
      labels:
        severity: critical

Given your examples, I could use the "static" label severity to change the alert routing, but really I want to check the query results and filter based on that. (E.g. for certain conditions, downgrade the alert or route it to the warnings channel == avoid pagerduty or similar)

Some examples of routes the way I'd like to write them:

    routes:
    - match:
        customer: developer1
      receiver: warnings-channel

    - match:
        environment: dev
      receiver:  warnings-channel

    - match:
        nodename: dev-host
      receiver:  warnings-channel

(In our case, there's more variables than just looking for "dev")

So either the route config above would accept optional keys (that may originate from the queries) or Alertmanager could have a new step in the pipeline to filter alerts

Right now the work around are not ideal or elegant:

sending all critical alerts to a webhook and have some logic determining what to do
Duplicate all alerting rules, that need extra filters (the alert.rules would multiply in size very quickly)

May 08 '22 16:05 Lusitaniae