zabbix-module-sockets
zabbix-module-sockets copied to clipboard
Bad performance under load
Hello. We started to use zabbix-module-sockets-1.1.0-1.x86_64.rpm and hit following problem. When OS has a lot of sockets, /proc/net/tcp read becomes too slow. zabbix-agents use 100% cpu and become unreachable. Thus we get holes in all graphs from high loaded host instead of stats while we get load spike.

What do you think about some internal caching implementation of /proc/net/ files inside zabbix-module-sockets ?
How frequently are you sampling the data? It looks very granular. The recommended granularity for Zabbix is typically 60-300 seconds. There is (in most cases) very little benefit and a high cost to < 60s sampling intervals.
It also looks like you are experiencing a transient spike in TCP sockets that might correlate to a load increase and Zabbix having less scheduled time on-CPU.
This module has not been profiled or optimized because the intention is that it takes only a small fraction of compute time, due to the wider Zabbix monitoring intervals.
If you'd like it optimized to run at this granularity, can you please provide some profile data under load, so I can where the code is spending most of its time in your case? The linux perf profile tool is a great option.