worker_pool icon indicating copy to clipboard operation
worker_pool copied to clipboard

Asynchronous shutdown of workers

Open sirihansen opened this issue 4 years ago • 4 comments

This is a slightly related to PR #171 . I wonder if you have considered making it possible to set the supervisor strategy for wpool_process_sup to simple_one_for_one. I mean, it is possible to set this value with the strategy option, but as far as I can understand it will only work for other strategies that simple_one_for_one.

The reason I would like this is mainly to obtain asynchronous shutdown of the workers in order to reduce shutdown time, so other ways of achieving this would also be interesting.

Any thoughts?

sirihansen avatar May 15 '20 09:05 sirihansen

Well… the idea for these pools of processes is to have a static number of workers and simple_one_for_one is meant to have a dynamic number of children.

Can you clarify why simple_one_for_one is needed to obtain asynchronous shutdown of workers?

elbrujohalcon avatar May 15 '20 09:05 elbrujohalcon

Any strategy except simple_one_for_one shuts down children synchronously, so the supervisor waits for the DOWN message from each child before shutting down the next (reverse order of how they are started). For simple_one_for_one, all children are shut down "at the same time", and then the supervisor waits for all DOWNs.

From supervisor documentation: "As a simple_one_for_one supervisor can have many children, it shuts them all down asynchronously. This means that the children do their cleanup in parallel, and therefore the order in which they are stopped is not defined." https://erlang.org/doc/man/supervisor.html

sirihansen avatar May 15 '20 10:05 sirihansen

WOW! I didn't know that… I'm not sure that is enough reason to implement the multiple changes needed to handle simple_one_for_one as a strategy for wpool_process_sup… but to be honest, I'm not sure how hard that change would be, but I'm assuming it won't be simple (for starters, we need to add the children workers after initialization and not during initialization).

Two seconds of thinking lead to this terrible terrible hack…

rpc:pmap(
  {gen_server, stop},
  [],
  [P || {_, P, _, _} <- supervisor:which_children(YourWpoolSup)]
).

I'm pretty sure that this is not a good idea, either because of the restarts (workers are probably not transient and the supervisor restart intensity might be high enough)… or because they might be working and the stop/1 calls may not get acted upon quick enough… or because the lack of timeout/backup measures (where we brutal-kill the workers if they don't stop in # of secs)… etc. etc.

I'll keep thinking about this…

elbrujohalcon avatar May 15 '20 11:05 elbrujohalcon

Yeah, given that the workers are permanent, the supervisor will restart them if terminated by someone else. So that would at least require to be able to set them to transient. I'll keep pondering a bit to see if I can come up with something.

sirihansen avatar May 15 '20 11:05 sirihansen