xyzpy icon indicating copy to clipboard operation
xyzpy copied to clipboard

The "pool" argument is a little confusing

Open shoyer opened this issue 5 years ago • 6 comments

It actually needs the concurrent.futures.Executor API. I tried to pass in a thread pool from multiprocessing and it didn't work.

shoyer avatar Nov 28 '18 00:11 shoyer

Thanks for picking this up, looks like the multiprocessing interface requires an iterable of arguments whereas the current apply_async method (which was originally for ipyparallel) expects unpacked arguments.

I'll add an explicit check for multiprocessing.pool.Pool instances and maybe add to the docs that the pool should match the API of one of:

  • concurrent.futures
  • multiprocessing.pool
  • ipyparallel

jcmgray avatar Nov 28 '18 13:11 jcmgray

Sounds good!

executor might also be a good name for this argument going forward.

You might also consider just deferring all of this to joblib under the covers. Between joblib's built-in backends (loky and threads) and plugin support (e.g., for dask and ipyparallel) I think it would cover all the bases.

shoyer avatar Nov 28 '18 23:11 shoyer

For the moment I will just implement the 'quick-fix' version and maybe deprecate the pool argument in favour of executor which I agree is a bit more explicit.

There definitely seems to be some advantages to joblib and loky in the long term though. As I see it:

  • parallelisation deferred to a mature library
  • handles setting the correct num_threads in processes for e.g. BLAS
  • cloudpickle for interactive function support.
  • more efficient handling of large numpy arrays?
  • Also joblib is already a xypzy dependency.

The main effort (and I haven't considered it much), would just be converting the nested submits to the flat Parallel()(delayed(fn)(*args, **kwargs) for ...) syntax and back. And possibly making sure the progress bar tracks results as they finish rather than as they are submitted.

Also it is quite convenient to supply other executors e.g. mpi4py.futures.MPIPoolExecutor directly without implementing a full joblib backend.

jcmgray avatar Nov 29 '18 13:11 jcmgray

Quick-fix in this commit by the way, but might leave this open as a reminder to consider joblib!

jcmgray avatar Nov 29 '18 16:11 jcmgray

Awesome, thanks @jcmgray! Did I mention how handy I found your package to be? :)

You could also just use loky directly instead of rewriting things to use the joblib interface -- it looks like it provides an Executor interface with most of these features: https://pypi.org/project/loky/. I don't think there are major advantages to using joblib's new interface from a library.

shoyer avatar Nov 29 '18 18:11 shoyer

Awesome, thanks @jcmgray! Did I mention how handy I found your package to be? :)

That's good to hear! All very much enabled by xarray of course ;)

I will investigate changing the default executor to loky, which it seems is shipped within joblib.externals already. Unless is it drastically slower or something, I expect it will make sense to change.


With regard to joblib proper, I guess I can see how it might be nice to do something like

with joblib.parallel_backend('dask'):
    h.harvest_combos(combos, parallel='joblib')

if this is (or becomes) a widespread pattern for specifying parallelisation.

jcmgray avatar Dec 01 '18 14:12 jcmgray