Thruk
Thruk copied to clipboard
Thruk crashes Nagios running ndo3 when passing commands
Describe the bug Whenever an acknowledgement is sent to a backend running NDO3 instead of ndo2db it kills the Nagiso service, with the following error in the log-
[1627565739] NDO-3: ndo_return = 1 (Commands out of sync; you can't run this command now) [1627565739] NDO-3: ndo_get_object_id_name2(ndo.c:1312): Unable to store results [1627565739] Caught SIGSEGV, shutting down... [1627565744] Caught SIGTERM, shutting down...
Thruk Version Version 2.38-2 NagiosXI 5.8.5
To Reproduce Steps to reproduce the behavior: connecting thruk to nagios server running ndo3 via livestatus acknowledge any alert
How does Thruk actually send acknowledgements to the backend nagios servers? does it write directly to the nagios.cmd file? or is there another method?
Thruk sends commands via livestatus. The same way it fetches all its hosts and services. You probably should file a bug for nagios, i guess ndo3 is part of their product?
Thanks for the swift reply
To quote NagiosXI team directly-
ndo2db is our older technology that basically listens on a UNIX socket for database inserts, then handles the actual insertion into the database. It has limits, being that it runs into issues when it tries to insert more than the database can handle. In newer versions (Nagios XI 5.7.0 and later), this was replaced by just writing directly to the database from the Nagios worker threads. In addition to being able to handle more database inserts, this resulted in an overall performance boost, too.
I can definitely see the advantage of that - since currently if we issue more than a few hundred downtimes in Thruk it can kill the Nagios service on some of our boxes
I've filed a bug with NagiosXI and they are asking for the command used so they can look at it further from their side
The submited commands should be logged in your thruk.log logfile.
ok cool, will send through to them
As a side note: have you any plans to find a replacement for livestatus? The last version was released a very long time ago and I'm pretty sure there is no support or interest in maintaining it since the creator released checkmk
Right now there are no plans to add support for other connections besides livestatus. NDO does not work bi-directional, it does not allow you to send commands and its also super slow compared to livestatus. In fact, livestatus has been invented because ndo performs so poorly.
Done some more work on this recently trying to get around the bug with NDO3
Running NagiosXI 5.9.3 and Thruk 3.04
On this server if I submit 20 acknowledgments it works fine, but if I go as high as 25 it crashes the backend with the same NDO3 error noted above
One thing I tried this time was taking the commands out of the thruk.log and manually adding them all to the nagios.cmd file - this works successfully, it is able to acknowledge 27 alerts at once
So the command structure and the time stamps etc are all fine, but for some reason when Thruk submits the same commands it gets "NDO-3: ndo_return = 1 (Commands out of sync; you can't run this command now)"
(Is this how livestatus does it? just by writting to the nagios.cmd file?)
livestatus writes the commands into the query handler to avoid such side effects.