pycbc icon indicating copy to clipboard operation
pycbc copied to clipboard

[conflict] Multiprocessing start method on macOS

Open meiyasan opened this issue 1 year ago • 5 comments

Hello,

I have many conflicts with other imports on macOS using PyCBC due to: https://github.com/gwastro/pycbc/blob/master/pycbc/init.py#L211C1-L216C49

 # MacosX after python3.7 switched to 'spawn', however, this does not
    # preserve common state information which we have relied on when using
    # multiprocessing based pools.
    import multiprocessing
    if hasattr(multiprocessing, 'set_start_method'):
        multiprocessing.set_start_method('fork')

Here is a typical error message:

  File "/Users/marcomeyer/.conda/envs/myenv/lib/python3.11/multiprocessing/context.py", line 248, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set

I assume this is used for performances, but in my case I don't use pycbc for heavy computation. Is there any chance to use spawn method or just disable this method using a custom variable maybe ?

meiyasan avatar Aug 16 '24 15:08 meiyasan

@xkzl Does this PR https://github.com/gwastro/pycbc/pull/4620 address the issue in your use case? If not, we are happy to accept PRs here to help improve this behavior for everyone. Or suggestions for how you'd like this to work.

ahnitz avatar Aug 20 '24 21:08 ahnitz

@ahnitz I see some updates from #4620, yes same issue.

I would perhaps recommend using the following piece of code, wherever a Pool is called instead of imposing a context to everyone in a shared library such as PyCBC.

from multiprocessing import get_context
get_context("fork").Pool()

instead of:

set_start_method('fork')
mp.Pool()

meiyasan avatar Sep 24 '24 01:09 meiyasan

@xkzl Thank you for that suggestion. That's seems like a very straightforward change so I've created a PR #4890 to correct this.

ahnitz avatar Sep 24 '24 03:09 ahnitz

@xkzl When you get the chance, let us know if this issue is now resolved. If you are satisfied, please close this issue, otherwise, and update on what further problems you experience would be helpful.

ahnitz avatar Sep 24 '24 15:09 ahnitz

There might be an additional fix, because of changes made in 'fork()' by Apple at High Sierra release for security purposes.

It would consist in checking the version of macOS >=10.13 and maybe consider using spawn after that. In my use of multiprocessing, I use "spawn" for macOS >= 10.13 otherwise "fork".

Roughly speaking "fork" is faster, because it reuses previous program state while "spawn" reloads all imports.

meiyasan avatar Oct 06 '24 21:10 meiyasan