alertmanager
alertmanager copied to clipboard
alertmanager sent repeated firing and resolved message after a resolved message ,the interval between two resolved is group_interval
What did you do? Started the prometheus followed by the alert manger.configure a alert route to my service.
What did you expect to see? Expect to see alerts go off after the resolved message unless they are resolved for some time and then became firing again.
Environment prometheus is prometheus:v2.26.0,alertmanager is alertmanager:v0.21.0. the parameters is resolve_timeout: 5m, group_wait: 30s, group_interval: 2m, repeat_interval: 10m configure a alert route to my service. and I printed the alert message in my service
What did you see instead take care of the log time .the startsAt and endsAt I received firing message in the very begining, firing in 00:13:07, 00:23:22,it's within expectations received resolved message in 00:30:27, but i received a new firing message in 00:31:22 a new resolved message in 00:32:27
the interval between two resolved is group_interval and they had the same startsAt info and endsAt info . also the 00:31:22 firing had the same info as the firing messages before so the new firing message and new resolved are all repeated I think. same time means it didn't occur new alert event.it's just the repeated old messages
what caused and how to fix it, need help,thank you all!
here is my service log 2021-11-15 00:13:07.620 INFO 1 --- [nio-8080-exec-3] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-0:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-0:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}
2021-11-15 00:23:22.547 INFO 1 --- [nio-8080-exec-3] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-2:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}
2021-11-15 00:30:27.647 INFO 1 --- [nio-8080-exec-4] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"2021-11-15T00:08:07.123Z","generatorURL":"http://prometheus-k8s-0:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-1:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}
2021-11-15 00:31:22.609 INFO 1 --- [nio-8080-exec-4] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-0:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}
2021-11-15 00:32:27.478 INFO 1 --- [nio-8080-exec-1] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"2021-11-15T00:08:07.123Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-1:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}
@roidelapluie need your help,thanks
I have the same problem. I set send_resloved = true
send me repeatedly firing and resolved
I have the same problem. send_resloved = true , I received a resolution notification while the alert was not resolved. Did you solve it? thank you
I have the same problem. I set send_resloved = true send me repeatedly firing and resolved
I have the same problem. send_resloved = true , I received a resolution notification while the alert was not resolved. Did you solve it? thank you
I have the same problem. I set send_resloved = true send me repeatedly firing and resolved
sorry but I still have this problem and no one helps me ...
I have the same problem. send_resloved = true , I received a resolution notification while the alert was not resolved. Did you solve it? thank you
sorry but I still have this problem and no one helps me ...
Hi @woshizhicainiaoluguo Any progress resolving this , I may have faced the same issue!
Can we see your Prometheus configuration? Are you sure that Prometheus and alertmanager time's are in sync? Thanks!
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"monitoring.coreos.com/v1","kind":"Prometheus","metadata":{"annotations":{},"labels":{"prometheus":"k8s"},"name":"k8s","namespace":"monitoring"},"spec":{"additionalScrapeConfigs":{"key":"prometheus-additional.yaml","name":"additional-configs"},"affinity":{"podAntiAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":[{"labelSelector":{"matchExpressions":[{"key":"app","operator":"In","values":["prometheus"]}]},"topologyKey":"kubernetes.io/hostname"}]}},"alerting":{"alertmanagers":[{"name":"alertmanager-main","namespace":"monitoring","port":"web"}]},"containers":[{"args":["--web.console.templates=/etc/prometheus/consoles","--web.console.libraries=/etc/prometheus/console_libraries","--config.file=/etc/prometheus/config_out/prometheus.env.yaml","--storage.tsdb.path=/prometheus","--storage.tsdb.retention.time=35d","--web.enable-lifecycle","--storage.tsdb.no-lockfile","--web.route-prefix=/","--storage.tsdb.min-block-duration=1h","--storage.tsdb.max-block-duration=1h"],"name":"prometheus","volumeMounts":[{"mountPath":"/etc/prometheus/targets","name":"pvc-prometheus-discovery"}]}],"image":"harbor-ppe1.eniot.io/kubernetes/prometheus:v2.26.0","nodeSelector":{"kubernetes.io/os":"linux"},"podMonitorNamespaceSelector":{},"podMonitorSelector":{},"probeNamespaceSelector":{},"probeSelector":{"matchLabels":{"prometheus":"k8s"}},"replicas":2,"resources":{"limits":{"cpu":"4","memory":"32Gi"},"requests":{"cpu":"100m","memory":"4Gi"}},"retention":"35d","ruleSelector":{"matchLabels":{"prometheus":"k8s","role":"alert-rules"}},"scrapeInterval":"1m","securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"prometheus-k8s","serviceMonitorNamespaceSelector":{},"serviceMonitorSelector":{},"storage":{"volumeClaimTemplate":{"metadata":{},"spec":{"resources":{"requests":{"storage":"512Gi"}},"storageClassName":"ceph-rbd-hdd2"}}},"volumes":[{"name":"pvc-prometheus-discovery","persistentVolumeClaim":{"claimName":"prometheus-discovery"}}]}}
creationTimestamp: "2020-02-18T07:29:07Z"
generation: 25
labels:
prometheus: k8s
name: k8s
namespace: monitoring
resourceVersion: "911224029"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/k8s
uid: 32ae3900-c6dc-451e-86f8-e57ef96df1b1
spec:
additionalScrapeConfigs:
key: prometheus-additional.yaml
name: additional-configs
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- prometheus
topologyKey: kubernetes.io/hostname
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
containers:
- args:
- --web.console.templates=/etc/prometheus/consoles
- --web.console.libraries=/etc/prometheus/console_libraries
- --config.file=/etc/prometheus/config_out/prometheus.env.yaml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention.time=10d
- --web.enable-lifecycle
- --storage.tsdb.no-lockfile
- --web.route-prefix=/
- --storage.tsdb.min-block-duration=1h
- --storage.tsdb.max-block-duration=1h
name: prometheus
volumeMounts:
- mountPath: /etc/prometheus/targets
name: pvc-prometheus-discovery
image: harbor-ppe1.eniot.io/kubernetes/prometheus:v2.26.0
nodeSelector:
kubernetes.io/os: linux
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
probeNamespaceSelector: {}
probeSelector:
matchLabels:
prometheus: k8s
replicas: 2
resources:
limits:
cpu: "4"
memory: 64Gi
requests:
cpu: 100m
memory: 4Gi
retention: 10d
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
scrapeInterval: 1m
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
storage:
volumeClaimTemplate:
metadata: {}
spec:
resources:
requests:
storage: 512Gi
storageClassName: ceph-rbd-hdd2
volumes:
- name: pvc-prometheus-discovery
persistentVolumeClaim:
claimName: prometheus-discovery
kind: List
metadata:
resourceVersion: ""
selfLink: ""
@roidelapluie this ? or ?
Checking the webhook logs I see the following timeline (the identity of the prometheus pod is visible looking at the generatorURL field):
- at 00:30:27,
prometheus-k8s-0
reports the alert as resolved. - at 00:31:22,
prometheus-k8s-1
reports the alert as firing. - at 00:32:27,
prometheus-k8s-1
reports the alert as resolved.
So prometheus-k8s-0
considered the alert to be resolved before prometheus-k8s-1
. The reason might be that prometheus-k8s-1
evaluated the rule later than prometheus-k8s-0
based on older data? One way to alleviate the issue would be to increase the group_interval
value.
Also do you run several Alertmanager pods? If yes, it might be an issue in the replication of notification logs.
Heya,
I have the same type of problem. Alerts are generated by Grafana unified alerting that runs on the local system. I have Alertmanager (version 0.23) running in a docker.
When i check the times i see that Grafana is using the local system time, but the Alertmanaget is using UTC time in the docker. Could this be the issue of recieving alerts en resolved massages? If so how can i change the timezone in the Alertmanager docker?
Thanks for the help.
Hi, If the alerts active for continous 30 mins and if set repeat interval as 2m and group wait as 1m the repeat alerts are having the same StartsAt, is there any we can modify it. I cannot set send_resolved to true, since we resolve alerts based of the resolve conditions. Any help would be greatly appreciated
@Jessimon
If so how can i change the timezone in the Alertmanager docker?
You can't but it shouldn't matter if Grafana serializes the timestamp with timezone information. Having said that you need to make sure that the clocks are synchronized between Alertmanager and Grafana.
What did you do? Started the prometheus followed by the alert manger.configure a alert route to my service.
What did you expect to see? Expect to see alerts go off after the resolved message unless they are resolved for some time and then became firing again.
Environment prometheus is prometheus:v2.26.0,alertmanager is alertmanager:v0.21.0. the parameters is resolve_timeout: 5m, group_wait: 30s, group_interval: 2m, repeat_interval: 10m configure a alert route to my service. and I printed the alert message in my service
What did you see instead take care of the log time .the startsAt and endsAt I received firing message in the very begining, firing in 00:13:07, 00:23:22,it's within expectations received resolved message in 00:30:27, but i received a new firing message in 00:31:22 a new resolved message in 00:32:27
the interval between two resolved is group_interval and they had the same startsAt info and endsAt info . also the 00:31:22 firing had the same info as the firing messages before so the new firing message and new resolved are all repeated I think. same time means it didn't occur new alert event.it's just the repeated old messages
what caused and how to fix it, need help,thank you all!
here is my service log 2021-11-15 00:13:07.620 INFO 1 --- [nio-8080-exec-3] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-0:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-0:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}
2021-11-15 00:23:22.547 INFO 1 --- [nio-8080-exec-3] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-2:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}
2021-11-15 00:30:27.647 INFO 1 --- [nio-8080-exec-4] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"2021-11-15T00:08:07.123Z","generatorURL":"http://prometheus-k8s-0:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-1:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}
2021-11-15 00:31:22.609 INFO 1 --- [nio-8080-exec-4] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-0:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}
2021-11-15 00:32:27.478 INFO 1 --- [nio-8080-exec-1] e.e.a.c.PrometheusAlertWebhookController : {"receiver":"common-alert-webhook","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"annotations":{},"startsAt":"2021-11-13T20:50:37.123Z","endsAt":"2021-11-15T00:08:07.123Z","generatorURL":"http://prometheus-k8s-1:9090/graph?g0.expr\u003dsum+by%28pod%29+%28container_memory_working_set_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%2F+sum+by%28pod%29+%28kube_pod_container_resource_limits_memory_bytes%7Bcontainer%3D%22ecp-rms-adaptor%22%2Cnamespace%3D%22op%22%7D%29+%3E+0.85\u0026g0.tab\u003d1","fingerprint":"79353055a4351098"}],"groupLabels":{"alertname":"1-1-273"},"commonLabels":{"alertname":"1-1-273","hook":"common-alert-webhook","pod":"ecp-rms-adaptor-56747877db-2p4hq","prometheus":"monitoring/k8s"},"commonAnnotations":{},"externalURL":"http://alertmanager-main-1:9093","version":"4","groupKey":"{}/{hook\u003d"common-alert-webhook"}:{alertname\u003d"1-1-273"}","truncatedAlerts":0}
one question : how did you get this log, is it alert manager pod log?
From the description, it looks like the issue I encountered and fixed in https://github.com/prometheus/alertmanager/pull/3283 (there is a testbench with how to reproduce it). The gist of the problem is that when group_interval
is less than peer_timeout
times number of nodes - last nodes in the ring will compare incorrect states in DedupStage.