[conflict] Multiprocessing start method on macOS
Hello,
I have many conflicts with other imports on macOS using PyCBC due to: https://github.com/gwastro/pycbc/blob/master/pycbc/init.py#L211C1-L216C49
# MacosX after python3.7 switched to 'spawn', however, this does not
# preserve common state information which we have relied on when using
# multiprocessing based pools.
import multiprocessing
if hasattr(multiprocessing, 'set_start_method'):
multiprocessing.set_start_method('fork')
Here is a typical error message:
File "/Users/marcomeyer/.conda/envs/myenv/lib/python3.11/multiprocessing/context.py", line 248, in set_start_method
raise RuntimeError('context has already been set')
RuntimeError: context has already been set
I assume this is used for performances, but in my case I don't use pycbc for heavy computation. Is there any chance to use spawn method or just disable this method using a custom variable maybe ?
@xkzl Does this PR https://github.com/gwastro/pycbc/pull/4620 address the issue in your use case? If not, we are happy to accept PRs here to help improve this behavior for everyone. Or suggestions for how you'd like this to work.
@ahnitz I see some updates from #4620, yes same issue.
I would perhaps recommend using the following piece of code, wherever a Pool is called instead of imposing a context to everyone in a shared library such as PyCBC.
from multiprocessing import get_context
get_context("fork").Pool()
instead of:
set_start_method('fork')
mp.Pool()
@xkzl Thank you for that suggestion. That's seems like a very straightforward change so I've created a PR #4890 to correct this.
@xkzl When you get the chance, let us know if this issue is now resolved. If you are satisfied, please close this issue, otherwise, and update on what further problems you experience would be helpful.
There might be an additional fix, because of changes made in 'fork()' by Apple at High Sierra release for security purposes.
It would consist in checking the version of macOS >=10.13 and maybe consider using spawn after that. In my use of multiprocessing, I use "spawn" for macOS >= 10.13 otherwise "fork".
Roughly speaking "fork" is faster, because it reuses previous program state while "spawn" reloads all imports.