kube-prometheus icon indicating copy to clipboard operation
kube-prometheus copied to clipboard

Duplicate entries of AlertmanagerClusterFailedToSendAlerts

Open hsolberg opened this issue 1 year ago • 1 comments

What happened? Hi! I noticed that there's duplicate entries of the alert AlertmanagerClusterFailedToSendAlerts. Only difference is the severity-level. It's defined here.

    - alert: AlertmanagerClusterFailedToSendAlerts
      annotations:
        description: The minimum notification failure rate to {{ $labels.integration
          }} sent from any instance in the {{$labels.job}} cluster is {{ $value |
          humanizePercentage }}.
        runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclusterfailedtosendalerts
        summary: All Alertmanager instances in a cluster failed to send notifications
          to a critical integration.
      expr: |
        min by (namespace,service, integration) (
          rate(alertmanager_notifications_failed_total{job="alertmanager-main",namespace="monitoring", integration=~`.*`}[5m])
        /
          rate(alertmanager_notifications_total{job="alertmanager-main",namespace="monitoring", integration=~`.*`}[5m])
        )
        > 0.01
      for: 5m
      labels:
        severity: critical
    - alert: AlertmanagerClusterFailedToSendAlerts
      annotations:
        description: The minimum notification failure rate to {{ $labels.integration
          }} sent from any instance in the {{$labels.job}} cluster is {{ $value |
          humanizePercentage }}.
        runbook_url: https://runbooks.prometheus-operator.dev/runbooks/alertmanager/alertmanagerclusterfailedtosendalerts
        summary: All Alertmanager instances in a cluster failed to send notifications
          to a non-critical integration.
      expr: |
        min by (namespace,service, integration) (
          rate(alertmanager_notifications_failed_total{job="alertmanager-main",namespace="monitoring", integration!~`.*`}[5m])
        /
          rate(alertmanager_notifications_total{job="alertmanager-main",namespace="monitoring", integration!~`.*`}[5m])
        )
        > 0.01
      for: 5m
      labels:
        severity: warning

Did you expect to see some different? Only expected to see one alert entry.

How to reproduce it (as minimally and precisely as possible): It's defined here with the only difference being the severity set to critical in one entry and warning in the other.

Environment

  • Prometheus Operator version:

quay.io/prometheus-operator/prometheus-operator:v0.58.0

  • Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.7", GitCommit:"42c05a547468804b2053ecf60a3bd15560362fc2", GitTreeState:"clean", BuildDate:"2022-05-24T12:30:55Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.7-eks-4721010", GitCommit:"b77d9473a02fbfa834afa67d677fd12d690b195f", GitTreeState:"clean", BuildDate:"2022-06-27T22:19:07Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

N/A

  • Manifests:

https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/alertmanager-prometheusRule.yaml#L64-L99

  • Prometheus Operator Logs:

None

  • Prometheus Logs:

None

Anything else we need to know?:

hsolberg avatar Sep 16 '22 09:09 hsolberg

This isn't really a bug IMHO since nothing breaks. The Alertmanager mixin offers the possibility to define which integration(s) are used for critical alerts (pagerduty for instance) vs. which aren't (e.g. chat) (see here for details). By default, it considers all integrations to be critical hence the alerting rule with severity=warning returns always no data (because of integration!~".*é" label selector).

simonpasquier avatar Sep 16 '22 12:09 simonpasquier