zabbix-module-sockets icon indicating copy to clipboard operation
zabbix-module-sockets copied to clipboard

Bad performance under load

Open voron opened this issue 7 years ago • 1 comments

Hello. We started to use zabbix-module-sockets-1.1.0-1.x86_64.rpm and hit following problem. When OS has a lot of sockets, /proc/net/tcp read becomes too slow. zabbix-agents use 100% cpu and become unreachable. Thus we get holes in all graphs from high loaded host instead of stats while we get load spike. screenshot 2018-06-12 17 15 49

What do you think about some internal caching implementation of /proc/net/ files inside zabbix-module-sockets ?

voron avatar Jun 12 '18 14:06 voron

How frequently are you sampling the data? It looks very granular. The recommended granularity for Zabbix is typically 60-300 seconds. There is (in most cases) very little benefit and a high cost to < 60s sampling intervals.

It also looks like you are experiencing a transient spike in TCP sockets that might correlate to a load increase and Zabbix having less scheduled time on-CPU.

This module has not been profiled or optimized because the intention is that it takes only a small fraction of compute time, due to the wider Zabbix monitoring intervals.

If you'd like it optimized to run at this granularity, can you please provide some profile data under load, so I can where the code is spending most of its time in your case? The linux perf profile tool is a great option.

cavaliercoder avatar Jun 12 '18 17:06 cavaliercoder