node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

scrapeperformance with qdisc

Open innovorichy opened this issue 4 years ago • 5 comments

Host operating system: output of uname -a

Linux <#hostname>-4 4.15.0-88-generic #88~16.04.1-Ubuntu SMP Wed Feb 12 04:19:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.0.1 (branch: HEAD, revision: 3715be6ae899f2a9b9dbfd9c39f3e09a7bd4559f) build user: root@1f76dbbcfa55 build date: 20200616-12:44:12 go version: go1.14.4

node_exporter command line flags

--collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|debugfs|devpts|devtmpfs|efivarfs|fusectl|hugetlbfs|mqueue|nsfs|proc|pstore|securityfs|sysfs) --collector.filesystem.ignored-mount-points=^(/rootfs)?/((var/lib/docker/)|(run/docker/netns/).|(sys/).|(run/user/).|(var/lib/kubelet/).|(data/nova).|(data/cephfs).) --collector.ntp.ip-ttl=8 --collector.ntp.local-offset-tolerance=1s --collector.ntp.server=XX.xx.XX.xx --collector.ntp.server-is-local --collector.textfile.directory=/etc/node_exporter/textfile_collector --web.listen-address=XX.xx.XX.xx:9100 --collector.arp --collector.conntrack --collector.cpu --collector.diskstats --collector.edac --collector.entropy --collector.filefd --collector.filesystem --collector.hwmon --collector.interrupts --collector.ipvs --collector.ksmd --collector.loadavg --collector.logind --collector.mdadm --collector.meminfo --collector.meminfo_numa --collector.mountstats --collector.netdev --collector.netstat --collector.ntp --collector.sockstat --collector.systemd --collector.qdisc --collector.tcpstat --collector.textfile --collector.time --collector.timex --collector.uname --collector.vmstat --collector.xfs --no-collector.bcache --no-collector.drbd --no-collector.infiniband --no-collector.wifi --no-collector.zfs

Are you running node_exporter in Docker?

no docker

What did you do that produced an error?

update node_exporter v0.18 -> v1.0.1

What did you expect to see?

node_scrape_collector_duration_seconds{collector="qdisc"} 0.832762764 for node_scrape_collector_duration_seconds{collector="qdisc"} in v0.18 we saw scrapetimes like ~0.5s with v1.0.1 we have three times that and hit the node_exporter scrape_timeout of 10s, while the OS didn't change and the response of time tc -s qdisc show tooks real 0m0.026s/user 0m0.005s/sys 0m0.021s

What did you see instead?

node_scrape_collector_duration_seconds{collector="qdisc"} 13.465078496 increased scrapetimes of collector="qdisc", which results in 15-20s scrape_timout for node_exporter

innovorichy avatar Jul 06 '20 06:07 innovorichy

Interesting, thanks for the report. Very few changes have been made. Only the addition of a couple new metrics.

It would be useful to gather some CPU profiles with pprof while the exporter is being scraped.

go tool pprof -svg http://localhost:9100/debug/pprof/profile > node_exporter.svg

It would be useful to have a profile for both the old and new versions.

SuperQ avatar Jul 06 '20 07:07 SuperQ

thx for the quick response, here the profiles node_exporter_0.18.zip node_exporter_1.0.1.zip

innovorichy avatar Jul 06 '20 09:07 innovorichy

Interesting, I don't see a qdisc call at all in the 0.18 version. But the 1.0.1 is spending a huge amount of time in netlink syscalls.

Can you describe your node setup a bit? What kind of networking/tc setup is this?

SuperQ avatar Jul 06 '20 12:07 SuperQ

oh, the profile of 0.18 is my fault - not in sync :-) - downgrade one tomorrow again, if needed. the node is used as an openstack compute node with ~100 tap-interfaces that hang in bridges, in an ovs again.

innovorichy avatar Jul 06 '20 13:07 innovorichy

@innovorichy Is this still an issue?

discordianfish avatar Feb 28 '22 09:02 discordianfish

Friendly ping, @innovorichy. Does this still persist for you?

rexagod avatar Mar 19 '24 14:03 rexagod

going to assume this is fixed, otherwise please reopen

discordianfish avatar Mar 21 '24 14:03 discordianfish