check_multi
check_multi copied to clipboard
Timeouts problem / Parallelize
Hi,
Schematically: Icinga -----active check---> gearman ----> check_nrpe -c exec-passive ----> check_multi -f /etc/check_multi -----> send_multi --> gearman-perfdata
check_multi is triggered by an active check via nrpe.
command[exec_passive]=LC_ALL=C /usr/lib/nagios/plugins/check_multi -f /etc/check_multi -r 256 | send_multi --server=192.168.51.4 --encryption=yes --key=should_be_changed --host=m2m_client.com
Then check_multi starts to run, one by one, checks.
If a check_tcp can't connect to his target, it will wait 10s before fail. send_multi will fail because he is hopping to get the data in less than 10s.
First idea was to increase the send_multi timeout, but then the problem was check_nrpe, which fails also if doesn't receive data in 10s.
I can increase also timeout in check_nrpe, but if a second check_tcp fails, I will need to set timeout in more than 20s.
Another approach is to limit the timeout of check_tcp, but same problem. If I set check_tcp timeout to 4s, and 3 checks fail, we are again over the 10s timeout.
The only soulution I can think of is to parallelize/fork execution checks. In that way, global timeout will be the timeout of the slower check.
Thanks!