awesome-prometheus-alerts changed Kernel info breaks querie(s)

changed Kernel info breaks querie(s)

Open roock opened this issue 1 year ago • 1 comments

This isse can be observed with at least the following alerts, but it might also affect other alerts using node_uname_info:

HostOutOfMemory
HostUnusualDiskReadRate
HostUnusualDiskWriteRate
HostHighCpuLoad
HostCpuHighIowait
HostPhysicalComponentTooHot

This is happening in the following conditions:

the servers is rebooted with a new(er) kernel and the version/release information changes
the servers has multiple monitored partitions e.g. rootfs and /srv for HostUnusualDiskReadRate and HostUnusualDiskWriteRate

execution: found duplicate series for the match group {instance="monitor.localdomain:9100"} on the right hand-side of the
operation:
[{__name__="node_uname_info", domainname="(none)", group="infra", instance="monitor.localdomain:9100",
job="node", machine="x86_64", nodename="monitor", release="5.10.0-27-amd64", sysname="Linux",
version="#1 SMP Debian 5.10.205-2 (2023-12-31)"},
{__name__="node_uname_info", domainname="(none)", group="infra", instance="monitor.localdomain:9100",
job="node", machine="x86_64", nodename="monitor", release="5.10.0-26-amd64", sysname="Linux",
version="#1 SMP Debian 5.10.197-1 (2023-09-29)"}];
many-to-many matching not allowed: matching labels must be unique on one side

Jan 09 '24 11:01 roock

I believe the reason is this part of the query: on(instance) group_left (nodename) node_uname_info{nodename=~".+"}

It is probably unnecessary and should be handled by relabeling in Prometheus or using a regexp.

Feb 25 '24 01:02 guruevi

awesome-prometheus-alerts awesome-prometheus-alerts copied to clipboard

changed Kernel info breaks querie(s)

awesome-prometheus-alerts
awesome-prometheus-alerts copied to clipboard