check_multi icon indicating copy to clipboard operation
check_multi copied to clipboard

Timeouts problem / Parallelize

Open adrianlzt opened this issue 11 years ago • 0 comments

Hi,

Schematically: Icinga -----active check---> gearman ----> check_nrpe -c exec-passive ----> check_multi -f /etc/check_multi -----> send_multi --> gearman-perfdata

check_multi is triggered by an active check via nrpe.

command[exec_passive]=LC_ALL=C /usr/lib/nagios/plugins/check_multi -f /etc/check_multi -r 256 | send_multi --server=192.168.51.4 --encryption=yes --key=should_be_changed --host=m2m_client.com

Then check_multi starts to run, one by one, checks.

If a check_tcp can't connect to his target, it will wait 10s before fail. send_multi will fail because he is hopping to get the data in less than 10s.

First idea was to increase the send_multi timeout, but then the problem was check_nrpe, which fails also if doesn't receive data in 10s.

I can increase also timeout in check_nrpe, but if a second check_tcp fails, I will need to set timeout in more than 20s.

Another approach is to limit the timeout of check_tcp, but same problem. If I set check_tcp timeout to 4s, and 3 checks fail, we are again over the 10s timeout.

The only soulution I can think of is to parallelize/fork execution checks. In that way, global timeout will be the timeout of the slower check.

Thanks!

adrianlzt avatar Jan 30 '14 12:01 adrianlzt