ipyparallel icon indicating copy to clipboard operation
ipyparallel copied to clipboard

Number of processes started by IPyParallel clusters

Open sahil1105 opened this issue 2 years ago • 8 comments

A lot of users seem to be surprised by the number of processes that are started as part of an IPyParallel cluster. In my testing, we seem to start 2 processes per engine (the engine itself and the nanny), an mpiexec process in case of MPI-Launcher, and 9 processes for the controller. The number of controller processes seem to be higher than expected (5 vs 9) based on: https://ipyparallel.readthedocs.io/en/latest/reference/connections.html#all-connections.

  1. Is this in line with what you expect (9 processes)? If so, is this a documentation issue that we could fix? If not, is this a bug that needs to be fixed?
  2. Is there a way to reduce the number of processes required in general across the board, in particular for the controller?
  3. Can we improve the documentation on this in general?

sahil1105 avatar Apr 19 '22 15:04 sahil1105

The connection/process diagram hasn't been updated to include the broadcast scheduler, which is itself multi-process for the tree, and accounts for $2^{depth+1}-1$ processes. The default is a depth of 1, which makes 3 processes. So that's a documentation issue.

You can run engines without the nanny, but that also loses all the benefits of the nanny (remote engine signalling, prompt crash events, etc.). You get a reduced-functionality version of this with MPI without the nanny, so it may make sense for you.

The controller can be started with --usethreads. This puts the schedulers all in threads instead of processes. It saves memory, but of course at the expense of parallelism because Python is itself single-threaded. - the Hub will slow way down if you have lots of executions going on, and the broadcast scheduler loses all parallelism so it probably only makes sense to use a broadcast depth of 0 if you do this.

This config should minimie processes:

# ipengine_config.py
c.IPEngine.use_nanny = False

# ipcontroller_config.py
c.IPController.broadcast_scheduler_depth = 0
c.IPController.use_threads = True

It's worth profiling the performance with threads to understand when the memory trade off may be worth it.

with just one process for the controller and each engine.

minrk avatar Apr 21 '22 07:04 minrk

Thanks for the detailed explanation @minrk! That makes complete sense. I agree, for small setups, e.g. on laptops, where you may only need a few engines, the minimized configuration could make sense. I think just documenting these details explicitly might be sufficient.

sahil1105 avatar Apr 23 '22 23:04 sahil1105

# ipengine_config.py
c.IPEngine.use_nanny = False

Quick correction: Should be enable_nanny instead of use_nanny.

sahil1105 avatar May 01 '22 15:05 sahil1105

@minrk I was able to get this working. Thanks again. Is there a way to specify these options in ipcluster_config.py or set them in user-code using cluster.config?

sahil1105 avatar May 01 '22 15:05 sahil1105

# ipcluster_config.py (or cluster.config)
c.EngineLauncher.engine_args=["--IPEngine.enable_nanny=False"]
c.ControllerLauncher.controller_args = ["--usethreads"]

minrk avatar May 02 '22 07:05 minrk

Great, tysm @minrk! This worked for me:

c = ipp.Cluster(engines='mpi', n=4)
c.config.EngineLauncher.engine_args=["--IPEngine.enable_nanny=False"]
c.config.ControllerLauncher.controller_args = ["--IPController.broadcast_scheduler_depth=0", "--IPController.use_threads=True"]

Would be great if we can update the docs with the updated diagram and this example.

sahil1105 avatar May 06 '22 03:05 sahil1105

Hi @minrk, any update on when the documentation can be updated with these details? We're planning to add it to our docs as well, but would be good to reference the official docs.

sahil1105 avatar Jun 07 '22 21:06 sahil1105

I don't have time to work on this right now, but if you wanted to have a stab, I'm happy to review.

minrk avatar Jun 08 '22 07:06 minrk