alertmanager icon indicating copy to clipboard operation
alertmanager copied to clipboard

alertmanager sent repeated firing and resolved message after a resolved message ,the interval between two resolved is group_interval

Open woshizhicainiaoluguo opened this issue 3 years ago • 12 comments

What did you do? Started the prometheus followed by the alert manger.configure a alert route to my service.

What did you expect to see? Expect to see alerts go off after the resolved message unless they are resolved for some time and then became firing again.

Environment prometheus is prometheus:v2.26.0,alertmanager is alertmanager:v0.21.0. the parameters is resolve_timeout: 5m, group_wait: 30s, group_interval: 2m, repeat_interval: 10m configure a alert route to my service. and I printed the alert message in my service

What did you see instead take care of the log time .the startsAt and endsAt I received firing message in the very begining, firing in 00:13:07, 00:23:22,it's within expectations received resolved message in 00:30:27, but i received a new firing message in 00:31:22 a new resolved message in 00:32:27

the interval between two resolved is group_interval and they had the same startsAt info and endsAt info . also the 00:31:22 firing had the same info as the firing messages before so the new firing message and new resolved are all repeated I think. same time means it didn't occur new alert event.it's just the repeated old messages

what caused and how to fix it, need help,thank you all!

here is my service log 2021-11-15 00:13:07.620 INFO 1 --- [nio-8080-exec-3] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-0:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-0:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}

2021-11-15 00:23:22.547 INFO 1 --- [nio-8080-exec-3] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-2:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}

2021-11-15 00:30:27.647 INFO 1 --- [nio-8080-exec-4] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"2021-11-15T00:08:07.123Z","generatorURL":"http://prometheus-k8s-0:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-1:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}

2021-11-15 00:31:22.609 INFO 1 --- [nio-8080-exec-4] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-0:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}

2021-11-15 00:32:27.478 INFO 1 --- [nio-8080-exec-1] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"2021-11-15T00:08:07.123Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-1:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}

woshizhicainiaoluguo avatar Nov 18 '21 07:11 woshizhicainiaoluguo

@roidelapluie need your help,thanks

woshizhicainiaoluguo avatar Nov 18 '21 08:11 woshizhicainiaoluguo

I have the same problem. I set send_resloved = true
send me repeatedly firing and resolved

zhouyu123666 avatar Dec 16 '21 08:12 zhouyu123666

I have the same problem. send_resloved = true , I received a resolution notification while the alert was not resolved. Did you solve it? thank you

jialanli avatar Jan 11 '22 02:01 jialanli

I have the same problem. I set send_resloved = true send me repeatedly firing and resolved

I have the same problem. send_resloved = true , I received a resolution notification while the alert was not resolved. Did you solve it? thank you

jialanli avatar Jan 11 '22 02:01 jialanli

I have the same problem. I set send_resloved = true send me repeatedly firing and resolved

sorry but I still have this problem and no one helps me ...

woshizhicainiaoluguo avatar Jan 11 '22 02:01 woshizhicainiaoluguo

I have the same problem. send_resloved = true , I received a resolution notification while the alert was not resolved. Did you solve it? thank you

sorry but I still have this problem and no one helps me ...

woshizhicainiaoluguo avatar Jan 11 '22 02:01 woshizhicainiaoluguo

Hi @woshizhicainiaoluguo Any progress resolving this , I may have faced the same issue!

AbdelhayBenhatomfirst avatar Jan 27 '22 11:01 AbdelhayBenhatomfirst

Can we see your Prometheus configuration? Are you sure that Prometheus and alertmanager time's are in sync? Thanks!

roidelapluie avatar Jan 29 '22 22:01 roidelapluie

apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
  kind: Prometheus
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"monitoring.coreos.com/v1","kind":"Prometheus","metadata":{"annotations":{},"labels":{"prometheus":"k8s"},"name":"k8s","namespace":"monitoring"},"spec":{"additionalScrapeConfigs":{"key":"prometheus-additional.yaml","name":"additional-configs"},"affinity":{"podAntiAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{"matchExpressions":[{"key":"app","operator":"In","values":["prometheus"]}]},"topologyKey":"kubernetes.io/hostname"}]}},"alerting":{"alertmanagers":[{"name":"alertmanager-main","namespace":"monitoring","port":"web"}]},"containers":[{"args":["--web.console.templates=/etc/prometheus/consoles","--web.console.libraries=/etc/prometheus/console_libraries","--config.file=/etc/prometheus/config_out/prometheus.env.yaml","--storage.tsdb.path=/prometheus","--storage.tsdb.retention.time=35d","--web.enable-lifecycle","--storage.tsdb.no-lockfile","--web.route-prefix=/","--storage.tsdb.min-block-duration=1h","--storage.tsdb.max-block-duration=1h"],"name":"prometheus","volumeMounts":[{"mountPath":"/etc/prometheus/targets","name":"pvc-prometheus-discovery"}]}],"image":"harbor-ppe1.eniot.io/kubernetes/prometheus:v2.26.0","nodeSelector":{"kubernetes.io/os":"linux"},"podMonitorNamespaceSelector":{},"podMonitorSelector":{},"probeNamespaceSelector":{},"probeSelector":{"matchLabels":{"prometheus":"k8s"}},"replicas":2,"resources":{"limits":{"cpu":"4","memory":"32Gi"},"requests":{"cpu":"100m","memory":"4Gi"}},"retention":"35d","ruleSelector":{"matchLabels":{"prometheus":"k8s","role":"alert-rules"}},"scrapeInterval":"1m","securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"prometheus-k8s","serviceMonitorNamespaceSelector":{},"serviceMonitorSelector":{},"storage":{"volumeClaimTemplate":{"metadata":{},"spec":{"resources":{"requests":{"storage":"512Gi"}},"storageClassName":"ceph-rbd-hdd2"}}},"volumes":[{"name":"pvc-prometheus-discovery","persistentVolumeClaim":{"claimName":"prometheus-discovery"}}]}}
    creationTimestamp: "2020-02-18T07:29:07Z"
    generation: 25
    labels:
      prometheus: k8s
    name: k8s
    namespace: monitoring
    resourceVersion: "911224029"
    selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/k8s
    uid: 32ae3900-c6dc-451e-86f8-e57ef96df1b1
  spec:
    additionalScrapeConfigs:
      key: prometheus-additional.yaml
      name: additional-configs
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values:
              - prometheus
          topologyKey: kubernetes.io/hostname
    alerting:
      alertmanagers:
      - name: alertmanager-main
        namespace: monitoring
        port: web
    containers:
    - args:
      - --web.console.templates=/etc/prometheus/consoles
      - --web.console.libraries=/etc/prometheus/console_libraries
      - --config.file=/etc/prometheus/config_out/prometheus.env.yaml
      - --storage.tsdb.path=/prometheus
      - --storage.tsdb.retention.time=10d
      - --web.enable-lifecycle
      - --storage.tsdb.no-lockfile
      - --web.route-prefix=/
      - --storage.tsdb.min-block-duration=1h
      - --storage.tsdb.max-block-duration=1h
      name: prometheus
      volumeMounts:
      - mountPath: /etc/prometheus/targets
        name: pvc-prometheus-discovery
    image: harbor-ppe1.eniot.io/kubernetes/prometheus:v2.26.0
    nodeSelector:
      kubernetes.io/os: linux
    podMonitorNamespaceSelector: {}
    podMonitorSelector: {}
    probeNamespaceSelector: {}
    probeSelector:
      matchLabels:
        prometheus: k8s
    replicas: 2
    resources:
      limits:
        cpu: "4"
        memory: 64Gi
      requests:
        cpu: 100m
        memory: 4Gi
    retention: 10d
    ruleSelector:
      matchLabels:
        prometheus: k8s
        role: alert-rules
    scrapeInterval: 1m
    securityContext:
      fsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
    serviceAccountName: prometheus-k8s
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector: {}
    storage:
      volumeClaimTemplate:
        metadata: {}
        spec:
          resources:
            requests:
              storage: 512Gi
          storageClassName: ceph-rbd-hdd2
    volumes:
    - name: pvc-prometheus-discovery
      persistentVolumeClaim:
        claimName: prometheus-discovery
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

@roidelapluie this ? or ?

woshizhicainiaoluguo avatar Feb 09 '22 12:02 woshizhicainiaoluguo

Checking the webhook logs I see the following timeline (the identity of the prometheus pod is visible looking at the generatorURL field):

  • at 00:30:27, prometheus-k8s-0 reports the alert as resolved.
  • at 00:31:22, prometheus-k8s-1 reports the alert as firing.
  • at 00:32:27, prometheus-k8s-1 reports the alert as resolved.

So prometheus-k8s-0 considered the alert to be resolved before prometheus-k8s-1. The reason might be that prometheus-k8s-1 evaluated the rule later than prometheus-k8s-0 based on older data? One way to alleviate the issue would be to increase the group_interval value.

Also do you run several Alertmanager pods? If yes, it might be an issue in the replication of notification logs.

simonpasquier avatar Mar 11 '22 15:03 simonpasquier

Heya,

I have the same type of problem. Alerts are generated by Grafana unified alerting that runs on the local system. I have Alertmanager (version 0.23) running in a docker.

When i check the times i see that Grafana is using the local system time, but the Alertmanaget is using UTC time in the docker. Could this be the issue of recieving alerts en resolved massages? If so how can i change the timezone in the Alertmanager docker?

Thanks for the help.

Jessimon avatar May 21 '22 09:05 Jessimon

Hi, If the alerts active for continous 30 mins and if set repeat interval as 2m and group wait as 1m the repeat alerts are having the same StartsAt, is there any we can modify it. I cannot set send_resolved to true, since we resolve alerts based of the resolve conditions. Any help would be greatly appreciated

ksathishanuta avatar Jul 11 '22 07:07 ksathishanuta

@Jessimon

If so how can i change the timezone in the Alertmanager docker?

You can't but it shouldn't matter if Grafana serializes the timestamp with timezone information. Having said that you need to make sure that the clocks are synchronized between Alertmanager and Grafana.

simonpasquier avatar Sep 23 '22 13:09 simonpasquier

What did you do? Started the prometheus followed by the alert manger.configure a alert route to my service.

What did you expect to see? Expect to see alerts go off after the resolved message unless they are resolved for some time and then became firing again.

Environment prometheus is prometheus:v2.26.0,alertmanager is alertmanager:v0.21.0. the parameters is resolve_timeout: 5m, group_wait: 30s, group_interval: 2m, repeat_interval: 10m configure a alert route to my service. and I printed the alert message in my service

What did you see instead take care of the log time .the startsAt and endsAt I received firing message in the very begining, firing in 00:13:07, 00:23:22,it's within expectations received resolved message in 00:30:27, but i received a new firing message in 00:31:22 a new resolved message in 00:32:27

the interval between two resolved is group_interval and they had the same startsAt info and endsAt info . also the 00:31:22 firing had the same info as the firing messages before so the new firing message and new resolved are all repeated I think. same time means it didn't occur new alert event.it's just the repeated old messages

what caused and how to fix it, need help,thank you all!

here is my service log 2021-11-15 00:13:07.620 INFO 1 --- [nio-8080-exec-3] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-0:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-0:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}

2021-11-15 00:23:22.547 INFO 1 --- [nio-8080-exec-3] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-2:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}

2021-11-15 00:30:27.647 INFO 1 --- [nio-8080-exec-4] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"2021-11-15T00:08:07.123Z","generatorURL":"http://prometheus-k8s-0:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-1:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}

2021-11-15 00:31:22.609 INFO 1 --- [nio-8080-exec-4] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-0:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}

2021-11-15 00:32:27.478 INFO 1 --- [nio-8080-exec-1] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"2021-11-15T00:08:07.123Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-1:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}

one question : how did you get this log, is it alert manager pod log?

shrikantdhomane avatar Nov 10 '22 12:11 shrikantdhomane

From the description, it looks like the issue I encountered and fixed in https://github.com/prometheus/alertmanager/pull/3283 (there is a testbench with how to reproduce it). The gist of the problem is that when group_interval is less than peer_timeout times number of nodes - last nodes in the ring will compare incorrect states in DedupStage.

yuri-tceretian avatar Apr 04 '23 14:04 yuri-tceretian