shinken icon indicating copy to clipboard operation
shinken copied to clipboard

Memory leak after each arbiter reload on many processes

Open efficks opened this issue 8 years ago • 3 comments

After doing an arbiter reload, I always get an increase in the memory of scheduler, receiver, broker, reactionner processes and they don't decrease over time. They increase at each reload.

The following graphs represents 2 reload on a 5 minutes period. image

The following represents 5 reloads on a 3 hour period. You can see the memory is not freed after 20 hours. image

Shinken version 2.4

efficks avatar Aug 31 '16 15:08 efficks

Hi, sorry for this late answer. Could you have a test with this PR (https://github.com/naparuba/shinken/pull/1828) by setting max_q_size to 1024 and results_batch to 2048 on pollers and reactionners, and broks_batch to 2048 on the broker in the shinken configuration ?

geektophe avatar Sep 01 '16 13:09 geektophe

Hi again. The PR I mentioned above also introduce a new service startup options in the services ini files.

If you set graceful_enabled=1 in schedulerd.ini, it will trigger an automatic service restart when a new configuration is received, and should fix this issue.

I have to work a little bit more on this feature because I had an error from time to time where the live scheduler daemon did not release its TCP port, preventing the new one to start. This is presumably due to a threading issue somewhere between cherrypy and shinken. If this happens, the scheduler has to be kill -9.

Thus if you decide to test it (which would be great), take care to enable this feature only on one of your schedulers (the master one), and monitor them carefully to be alerted if such a situation happened.

I'd be glad to have your feedback on this feature.

geektophe avatar Sep 29 '16 07:09 geektophe

The PR has been merged. Could you have a test with latest Shinken sources ?

I personally use the following parameters in poller and reactionner configuration:

define poller|reactionner {
    ...
    processes_by_worker 128  ; Each worker manages N checks
    q_factor            2    ; Maximimum number of checks to enqueue =
                             ; q_factor * processes_by_worker * cores
    results_batch       2048 ; How many results are returned in a send batch
    ...
}

And in broker configuration:

define poller|reactionner {
    ...
    broks_batch         4096 ; The maximum number of broks per request
    ...
}

geektophe avatar Jun 27 '17 16:06 geektophe