Use multiprocess if available
Windows doesn't have a true fork implementation, so multiprocessing and pickling are limited compared to other platforms. There are also some situations in Linux / Mac OS X where we would like to pickle something, but it does not work without some code changes. This PR makes pickling for mulitprocessing work in more situations by using the multiprocess library, if it's available.multiprocess is a fork of multiprocessing that uses dill instead of pickle.
This PR is set up such that we do not require the user to have multiprocess installed, but if it is installed we'll use it -- except in Windows, where pickling is so limited that it will make things easier to require it. There are a few other routes we could go:
-
Don't require it in Windows: This makes things simpler for Windows users who don't want to use multiprocessing. Installing the
multiprocesspackage should work on all platforms, but if there's a platform where there are likely to be issues, it's Windows. -
Require it for all platforms: This will likely be a net benefit to everyone, but it introduces a new dependency. Also,
dillwill always be a little bit slower thanpicklesince it's able to pickle more things. However, if we're not pickling anything too large anyway, performance shouldn't be an issue, and it would be possible to simplify some parts of the codebase by usingdillinstead ofcloudpickleand removing some of the hacks that we currently use to get Windows to pickle things that it normally can't.
I'm happy to modify this PR to implement either option if you want! I've been using doit for a relatively large data science project and it's been a huge help.
Thanks.
-
On runner.py
MRunnerthere is a check ifmultiprocessingis available. That should be taken into account... -
In that spirit I guess it would be better to NOT make
multiprocessrequired on Windows. -
Also it would be nice to consolidate try/catch of the imports. Maybe on "compat.py" file.
-
Thanks for taking care of CHANGES file. Please also mention this in the "install.rst" file.
-
Finally, need to think how to deal this on CI.
Require it for all platforms: This will likely be a net benefit to everyone
could you please elaborate
It will also be useful on macOS, where spawn became the default in 3.8, and fork doesn’t always work properly.
https://github.com/uqfoundation/multiprocess/issues/65
It seems multiprocess package forced a "regression" because one of the features added (Pool) does not work well with "spawn" method. This leads me to think that multiprocess could be accepted as a multiprocessing replacement but should not be installed by default even for Windows/MAC.
Hi @schettino72, since I was touching the doit code for #377 I updated this one also to take your feedback into account.
On runner.py
MRunnerthere is a check ifmultiprocessingis available. That should be taken into account...
Done, I implemented the check in compat.py and call it from MRunner now.
In that spirit I guess it would be better to NOT make
multiprocessrequired on Windows.
Yes, that makes sense. I instead made it an optional dependency that you can install with pip install doit[multiprocess].
Also it would be nice to consolidate try/catch of the imports. Maybe on "compat.py" file.
Done, the imports are done in compat.py now.
Thanks for taking care of CHANGES file. Please also mention this in the "install.rst" file.
Done!
Finally, need to think how to deal this on CI.
I didn't add anything to CI, but you could either modify one of the existing builds in your build matrix to install with multiprocess, by changing
- run: pip install . -r dev_requirements.txt
to
- run: pip install .[multiprocess] -r dev_requirements.txt
for that build (might need quotes, like pip install '.[multiprocess]', not sure what kind of shell Github actions uses). Or you could add a new build to the matrix that does the above with the latest version of Python. Those are my ideas anyhow; running the tests locally with multiprocess works correctly.
If you like, I can try to add a test that would fail with multiprocessing but succeed with multiprocess.
If you like, I can try to add a test that would fail with multiprocessing but succeed with multiprocess.
That would be great.