node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

Node exporter have high memory usage in some nodes

Open guymeron opened this issue 1 year ago • 4 comments

Host operating system: output of uname -a

Linux ip-XX-XX-XX-XX.ap-northeast-1.compute.internal 6.6.35-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.6.35-0gardenlinux1~bp1443 (2024- x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.8.1 (branch: HEAD, revision: 400c3979931613db930ea035f39ce7b377cdbb5b)
  build user:       root@7afbff271a3f
  build date:       20240521-18:36:22
  go version:       go1.22.3
  platform:         linux/amd64
  tags:                unknown

node_exporter command line flags

Args:
  --path.procfs=/host/proc
  --path.sysfs=/host/sys
  --path.rootfs=/host/root
  --path.udev.data=/host/root/run/udev/data
  --web.listen-address=[$(HOST_IP)]:9100
  --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
  --collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$

node_exporter log output

ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=netclass
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=thermal_zone
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=timex
ts=2024-07-25T02:15:04.662Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.455484629 err="could not get net class info: failed to read file \"/host/sys/class/net/califf257bc2536/ifalias\": open /host/sys/class/net/califf257bc2536/ifalias: no such device"
ts=2024-07-25T10:15:04.563Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.353718454 err="could not get net class info: failed to read file \"/host/sys/class/net/calie3498b6b174/carrier_changes\": no such device"
ts=2024-07-25T00:30:04.465Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.258062288 err="could not get net class info: failed to read file \"/host/sys/class/net/cali7b9ce739791/threaded\": no such device"
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=bonding
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=conntrack
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=edac
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=nfsd
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=softnet
ts=2024-07-27T14:59:04.366Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.143897883 err="could not get net class info: failed to read file \"/host/sys/class/net/cali25d3d338fcb/dev_port\": open /host/sys/class/net/cali25d3d338fcb/dev_port: no such device"
ts=2024-07-24T10:09:35.767Z caller=node_exporter.go:193 level=info msg="Starting node_exporter" version="(version=1.8.1, branch=HEAD, revision=400c3979931613db930ea035f39ce7b377cdbb5b)"
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=cpu
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=fibrechannel
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=loadavg
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=udp_queues
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=watchdog
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=hwmon
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=nfs
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=rapl
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=uname
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:111 level=info msg="Enabled collectors"
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=arp
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=btrfs
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=cpufreq
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=vmstat
ts=2024-07-24T10:09:35.864Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9100
ts=2024-07-24T10:09:35.864Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9100
ts=2024-07-27T10:30:04.562Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.296004707 err="could not get net class info: failed to read file \"/host/sys/class/net/calic75c59c4569/carrier_up_count\": no such device"
ts=2024-07-28T22:12:04.379Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.113815084 err="could not get net class info: failed to read file \"/host/sys/class/net/cali54e5a64fb2f/testing\": open /host/sys/class/net/cali54e5a64fb2f/testing: no such device"
ts=2024-07-28T23:51:04.562Z caller=collector.go:169 level=error msg="collector failed" name=netclass duration_seconds=0.340009443 err="could not get net class info: failed to read file \"/host/sys/class/net/cali9bb535b8912/ifindex\": no such device"
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=diskstats
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=infiniband
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=ipvs
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=nvme
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=schedstat
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=selinux
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=zfs
ts=2024-07-24T10:09:35.863Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=netstat
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=os
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=sockstat
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=stat
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=textfile
ts=2024-07-24T10:09:35.767Z caller=node_exporter.go:194 level=info msg="Build context" build_context="(go=go1.22.3, platform=linux/amd64, user=root@7afbff271a3f, date=20240521-18:36:22, tags=unknown)"
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=bcache
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=dmi
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=filefd
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=mdadm
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=pressure
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=time
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=xfs
ts=2024-07-24T10:09:35.767Z caller=diskstats_common.go:111 level=info collector=diskstats msg="Parsed flag --collector.diskstats.device-exclude" flag=^(z?ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=entropy
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=filesystem
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=meminfo
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=powersupplyclass
ts=2024-07-24T10:09:35.864Z caller=node_exporter.go:118 level=info collector=tapestats
ts=2024-07-24T10:09:35.863Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
ts=2024-07-24T10:09:35.863Z caller=node_exporter.go:118 level=info collector=netdev

Are you running node_exporter in Docker?

k8s image

What did you do that produced an error?

nothing... the error seems to always appear on 1 of the k8s node

What did you expect to see?

standard memory usage

What did you see instead?

The memory increases over time

memory usage graph:

image

pprof file

node_exporter

guymeron avatar Jul 30 '24 12:07 guymeron