shinken
shinken copied to clipboard
Memory leak after each arbiter reload on many processes
After doing an arbiter reload, I always get an increase in the memory of scheduler, receiver, broker, reactionner processes and they don't decrease over time. They increase at each reload.
The following graphs represents 2 reload on a 5 minutes period.
The following represents 5 reloads on a 3 hour period. You can see the memory is not freed after 20 hours.
Shinken version 2.4
Hi, sorry for this late answer. Could you have a test with this PR (https://github.com/naparuba/shinken/pull/1828) by setting max_q_size
to 1024
and results_batch
to 2048
on pollers and reactionners, and broks_batch
to 2048
on the broker in the shinken configuration ?
Hi again. The PR I mentioned above also introduce a new service startup options in the services ini files.
If you set graceful_enabled=1
in schedulerd.ini
, it will trigger an automatic service restart when a new configuration is received, and should fix this issue.
I have to work a little bit more on this feature because I had an error from time to time where the live scheduler daemon did not release its TCP port, preventing the new one to start. This is presumably due to a threading issue somewhere between cherrypy and shinken. If this happens, the scheduler has to be kill -9
.
Thus if you decide to test it (which would be great), take care to enable this feature only on one of your schedulers (the master one), and monitor them carefully to be alerted if such a situation happened.
I'd be glad to have your feedback on this feature.
The PR has been merged. Could you have a test with latest Shinken sources ?
I personally use the following parameters in poller and reactionner configuration:
define poller|reactionner {
...
processes_by_worker 128 ; Each worker manages N checks
q_factor 2 ; Maximimum number of checks to enqueue =
; q_factor * processes_by_worker * cores
results_batch 2048 ; How many results are returned in a send batch
...
}
And in broker configuration:
define poller|reactionner {
...
broks_batch 4096 ; The maximum number of broks per request
...
}