naemon-core icon indicating copy to clipboard operation
naemon-core copied to clipboard

launch command worker earlier

Open sni opened this issue 1 year ago • 6 comments

since the command worker forks from the main naemon process, it inherits all open files like ex.: pidfile, logfiles, etc... It will keep those references open, even if the main process rotates and reopens those files.

This patch closes query handler and pid file references after starting the command worker and also moves starting the command worker before initializing the neb modules, so it won't inherit open logfiles from neb modules.

references:

  • https://github.com/ConSol-Monitoring/omd/issues/146

sni avatar Jul 04 '24 13:07 sni

The only issue I see with this is that the command file worker completely loses the ability to communicate. Currently, the command file worker is able to write to the naemon.log until this line: https://github.com/naemon/naemon-core/blob/22f0fb66fc0d7370ebbc5d2e6bcc66d8b9311392/src/naemon/commands.c#L392

After that, it can only write to stdout to log any errors. Not optimal, but better than nothing, I guess.

Due to the change, the command file worker can't even write to stdout anymore, so any error messages are completely gone. The static int command_file_worker method makes some calls to nm_log, which now get completely blackholed.

This change also introduces a sort of "breaking change". Currently, whenever Naemon logs the message Successfully launched command file worker with pid 12135, we know that Naemon has done all initial processing and is up and running.

I personal only use this when working or debugging Naemon directly, but it turns out that some scripts rely on this. For example, this one which is part of our repository: https://github.com/naemon/naemon-core/blob/22f0fb66fc0d7370ebbc5d2e6bcc66d8b9311392/features/steps/naemon.py#L138-L155

I think we should add a new message to the log that clearly indicates that Naemon is ready. Something like

Naemon initialization complete. Monitoring initiated.

nook24 avatar Jul 07 '24 12:07 nook24

i am not to happy yet either, maybe we keep the order as it was but send a USR1 signal to the command worker so it can rotate/reopen the logfile as well.

sni avatar Jul 07 '24 16:07 sni

btw, the message Successfully launched command file worker with is logged from the main naemon process and should still be printed.

sni avatar Jul 08 '24 08:07 sni

maybe we keep the order as it was but send a USR1 signal to the command worker so it can rotate/reopen the logfile as well.

Currently the command file worker can only write to stdout. I guess this is due to it's a fork() and could otherwise break the log file if multiple processes wring to the log at the same time.

I think this is also the reason why the nm_core_worker is using printf instead of nm_log

btw, the message Successfully launched command file worker with is logged from the main naemon process and should still be printed.

Yes, this message gets still logged as before, but due due it is now before the broker modules get loaded, it does not indicate that Naemon is ready anymore. To be fair, this message should never indicate that Naemon is ready so I think we should not be too concerned about this. But I would still recommend to add a new log message to make it possible to detect if Naemon is ready.

nook24 avatar Jul 08 '24 08:07 nook24

i added a new log entry [1738920412] Naemon successfully initialized (PID=740899) right before starting the event loop. If that message is logged, you know naemon is up and running.

sni avatar Feb 07 '25 09:02 sni

This PR also fixes an issue where the command handler process uses the same amount of memory as the main process. This can be quite a bit on larger setups and is totally unnecessary.

sni avatar Feb 07 '25 09:02 sni