node_exporter
node_exporter copied to clipboard
CPU collector keeps reporting metrics of offline CPUs
Host operating system: output of uname -a
Linux debianvm 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
node_exporter version: output of node_exporter --version
node_exporter, version 1.5.0 (branch: master, revision: 7f05e312acd828f0c4ddcd194e275c05ffa35992)
build user: hsun@debianvm
build date: 20230217-16:48:57
go version: go1.20.1
platform: linux/amd64
node_exporter command line flags
None.
node_exporter log output
ts=2023-02-17T16:53:10.662Z caller=node_exporter.go:180 level=info msg="Starting node_exporter" version="(version=1.5.0, branch=master, revision=7f05e312acd828f0c4ddcd194e275c05ffa35992)"
ts=2023-02-17T16:53:10.663Z caller=node_exporter.go:181 level=info msg="Build context" build_context="(go=go1.20.1, platform=linux/amd64, user=hsun@debianvm, date=20230217-16:48:57)"
ts=2023-02-17T16:53:10.663Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+|var/lib/containers/storage/.+)($|/)
ts=2023-02-17T16:53:10.664Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
ts=2023-02-17T16:53:10.665Z caller=diskstats_common.go:111 level=info collector=diskstats msg="Parsed flag --collector.diskstats.device-exclude" flag=^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$
ts=2023-02-17T16:53:10.665Z caller=node_exporter.go:110 level=info msg="Enabled collectors"
ts=2023-02-17T16:53:10.665Z caller=node_exporter.go:117 level=info collector=arp
ts=2023-02-17T16:53:10.665Z caller=node_exporter.go:117 level=info collector=bcache
ts=2023-02-17T16:53:10.665Z caller=node_exporter.go:117 level=info collector=bonding
ts=2023-02-17T16:53:10.665Z caller=node_exporter.go:117 level=info collector=btrfs
ts=2023-02-17T16:53:10.665Z caller=node_exporter.go:117 level=info collector=conntrack
ts=2023-02-17T16:53:10.665Z caller=node_exporter.go:117 level=info collector=cpu
ts=2023-02-17T16:53:10.665Z caller=node_exporter.go:117 level=info collector=cpufreq
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=diskstats
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=dmi
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=edac
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=entropy
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=fibrechannel
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=filefd
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=filesystem
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=hwmon
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=infiniband
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=ipvs
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=loadavg
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=mdadm
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=meminfo
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=netclass
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=netdev
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=netstat
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=nfs
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=nfsd
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=nvme
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=os
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=powersupplyclass
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=pressure
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=rapl
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=schedstat
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=selinux
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=sockstat
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=softnet
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=stat
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=tapestats
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=textfile
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=thermal_zone
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=time
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=timex
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=udp_queues
ts=2023-02-17T16:53:10.666Z caller=node_exporter.go:117 level=info collector=uname
ts=2023-02-17T16:53:10.667Z caller=node_exporter.go:117 level=info collector=vmstat
ts=2023-02-17T16:53:10.667Z caller=node_exporter.go:117 level=info collector=xfs
ts=2023-02-17T16:53:10.667Z caller=node_exporter.go:117 level=info collector=zfs
ts=2023-02-17T16:53:10.667Z caller=tls_config.go:232 level=info msg="Listening on" address=[::]:9100
ts=2023-02-17T16:53:10.667Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=[::]:9100
Are you running node_exporter in Docker?
No.
What did you do that produced an error?
- Start Node Exporter.
- Several CPUs become offline. (hyperthreading/SMT changes)
- Metrics from offline CPU persists
What did you expect to see?
Metrics from offline CPUs should disappear.
What did you see instead?
Metrics from offline CPU persist.
More Details
node_cpu_seconds_total
metrics before CPU 5 went offline.
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 12295.95
node_cpu_seconds_total{cpu="0",mode="iowait"} 11.01
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="nice"} 30.25
node_cpu_seconds_total{cpu="0",mode="softirq"} 26.99
node_cpu_seconds_total{cpu="0",mode="steal"} 0
node_cpu_seconds_total{cpu="0",mode="system"} 256.7
node_cpu_seconds_total{cpu="0",mode="user"} 173.35
node_cpu_seconds_total{cpu="1",mode="idle"} 12099.1
node_cpu_seconds_total{cpu="1",mode="iowait"} 4.24
node_cpu_seconds_total{cpu="1",mode="irq"} 0
node_cpu_seconds_total{cpu="1",mode="nice"} 94.82
node_cpu_seconds_total{cpu="1",mode="softirq"} 28.62
node_cpu_seconds_total{cpu="1",mode="steal"} 0
node_cpu_seconds_total{cpu="1",mode="system"} 280.99
node_cpu_seconds_total{cpu="1",mode="user"} 193.84
node_cpu_seconds_total{cpu="2",mode="idle"} 12168.34
node_cpu_seconds_total{cpu="2",mode="iowait"} 5.68
node_cpu_seconds_total{cpu="2",mode="irq"} 0
node_cpu_seconds_total{cpu="2",mode="nice"} 67
node_cpu_seconds_total{cpu="2",mode="softirq"} 25.54
node_cpu_seconds_total{cpu="2",mode="steal"} 0
node_cpu_seconds_total{cpu="2",mode="system"} 275.96
node_cpu_seconds_total{cpu="2",mode="user"} 193.49
node_cpu_seconds_total{cpu="3",mode="idle"} 12237.18
node_cpu_seconds_total{cpu="3",mode="iowait"} 2.33
node_cpu_seconds_total{cpu="3",mode="irq"} 0
node_cpu_seconds_total{cpu="3",mode="nice"} 40.56
node_cpu_seconds_total{cpu="3",mode="softirq"} 37.14
node_cpu_seconds_total{cpu="3",mode="steal"} 0
node_cpu_seconds_total{cpu="3",mode="system"} 246.59
node_cpu_seconds_total{cpu="3",mode="user"} 173.91
node_cpu_seconds_total{cpu="4",mode="idle"} 12184.39
node_cpu_seconds_total{cpu="4",mode="iowait"} 5.22
node_cpu_seconds_total{cpu="4",mode="irq"} 0
node_cpu_seconds_total{cpu="4",mode="nice"} 46.28
node_cpu_seconds_total{cpu="4",mode="softirq"} 26.59
node_cpu_seconds_total{cpu="4",mode="steal"} 0
node_cpu_seconds_total{cpu="4",mode="system"} 279.66
node_cpu_seconds_total{cpu="4",mode="user"} 188.91
node_cpu_seconds_total{cpu="5",mode="idle"} 1318.18
node_cpu_seconds_total{cpu="5",mode="iowait"} 0.26
node_cpu_seconds_total{cpu="5",mode="irq"} 0
node_cpu_seconds_total{cpu="5",mode="nice"} 0.01
node_cpu_seconds_total{cpu="5",mode="softirq"} 51.1
node_cpu_seconds_total{cpu="5",mode="steal"} 0
node_cpu_seconds_total{cpu="5",mode="system"} 235.27
node_cpu_seconds_total{cpu="5",mode="user"} 66.04
/proc/stat
before CPU 5 offline
cpu 99263 27893 157903 6283804 2883 0 19624 0 0 0
cpu0 17396 3025 25736 1238523 1106 0 2704 0 0 0
cpu1 19434 9482 28148 1218751 425 0 2872 0 0 0
cpu2 19410 6700 27672 1225730 568 0 2557 0 0 0
cpu3 17441 4056 24731 1232606 234 0 3714 0 0 0
cpu4 18935 4628 28022 1227396 522 0 2662 0 0 0
cpu5 6644 1 23591 140796 26 0 5113 0 0 0
intr 6330554 36 4281 0 0 0 0 0 0 0 0 0 0 7776 0 0 12083 0 0 100661 24266 140115 37165 56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 10356386
btime 1676640236
processes 29007
procs_running 1
procs_blocked 0
softirq 2398665 11 525797 2354 24836 44301 2 4406 816272 0 980686
node_cpu_seconds_total
metrics after CPU 5 went offline.
The metrics of CPU 5 stay the same value everytime we refresh the metrics in Node Exporter.
node_cpu_seconds_total{cpu="0",mode="idle"} 12489.89
node_cpu_seconds_total{cpu="0",mode="iowait"} 11.1
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="nice"} 30.25
node_cpu_seconds_total{cpu="0",mode="softirq"} 27.04
node_cpu_seconds_total{cpu="0",mode="steal"} 0
node_cpu_seconds_total{cpu="0",mode="system"} 257.56
node_cpu_seconds_total{cpu="0",mode="user"} 174.16
node_cpu_seconds_total{cpu="1",mode="idle"} 12290.9
node_cpu_seconds_total{cpu="1",mode="iowait"} 4.3
node_cpu_seconds_total{cpu="1",mode="irq"} 0
node_cpu_seconds_total{cpu="1",mode="nice"} 94.82
node_cpu_seconds_total{cpu="1",mode="softirq"} 28.74
node_cpu_seconds_total{cpu="1",mode="steal"} 0
node_cpu_seconds_total{cpu="1",mode="system"} 281.86
node_cpu_seconds_total{cpu="1",mode="user"} 194.54
node_cpu_seconds_total{cpu="2",mode="idle"} 12361.52
node_cpu_seconds_total{cpu="2",mode="iowait"} 5.68
node_cpu_seconds_total{cpu="2",mode="irq"} 0
node_cpu_seconds_total{cpu="2",mode="nice"} 67
node_cpu_seconds_total{cpu="2",mode="softirq"} 25.58
node_cpu_seconds_total{cpu="2",mode="steal"} 0
node_cpu_seconds_total{cpu="2",mode="system"} 276.98
node_cpu_seconds_total{cpu="2",mode="user"} 194.38
node_cpu_seconds_total{cpu="3",mode="idle"} 12430.5
node_cpu_seconds_total{cpu="3",mode="iowait"} 2.34
node_cpu_seconds_total{cpu="3",mode="irq"} 0
node_cpu_seconds_total{cpu="3",mode="nice"} 40.56
node_cpu_seconds_total{cpu="3",mode="softirq"} 37.14
node_cpu_seconds_total{cpu="3",mode="steal"} 0
node_cpu_seconds_total{cpu="3",mode="system"} 247.52
node_cpu_seconds_total{cpu="3",mode="user"} 174.7
node_cpu_seconds_total{cpu="4",mode="idle"} 12378.86
node_cpu_seconds_total{cpu="4",mode="iowait"} 5.22
node_cpu_seconds_total{cpu="4",mode="irq"} 0
node_cpu_seconds_total{cpu="4",mode="nice"} 46.28
node_cpu_seconds_total{cpu="4",mode="softirq"} 26.63
node_cpu_seconds_total{cpu="4",mode="steal"} 0
node_cpu_seconds_total{cpu="4",mode="system"} 280.47
node_cpu_seconds_total{cpu="4",mode="user"} 189.53
node_cpu_seconds_total{cpu="5",mode="idle"} 1318.18
node_cpu_seconds_total{cpu="5",mode="iowait"} 0.26
node_cpu_seconds_total{cpu="5",mode="irq"} 0
node_cpu_seconds_total{cpu="5",mode="nice"} 0.01
node_cpu_seconds_total{cpu="5",mode="softirq"} 51.1
node_cpu_seconds_total{cpu="5",mode="steal"} 0
node_cpu_seconds_total{cpu="5",mode="system"} 235.27
node_cpu_seconds_total{cpu="5",mode="user"} 66.04
/proc/stat
after CPU 5 offline. The metrics of CPU 5 disappeared.
cpu 99348 27893 158013 6650156 3173 0 19627 0 0 0
cpu0 17409 3025 25754 1248489 1110 0 2704 0 0 0
cpu1 19446 9482 28180 1228625 430 0 2873 0 0 0
cpu2 19435 6700 27683 1235686 568 0 2558 0 0 0
cpu3 17461 4056 24749 1242583 234 0 3714 0 0 0
cpu4 18949 4628 28042 1237398 522 0 2662 0 0 0
intr 6357674 36 4385 0 0 0 0 0 0 0 0 0 0 7848 0 0 12183 0 0 100819 24381 143630 37230 56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 10418558
btime 1676640236
processes 29024
procs_running 1
procs_blocked 0
softirq 2408562 12 528701 2375 24957 44417 3 4439 819905 0 983753