Multiprocessing Tutorial Example Not Working
General information:
- emcee version: 3.0.2
- platform: python, jupyter notebook
- installation method (pip/conda/source/other?): pip
Problem description:
Expected behavior: That the multiprocessing example using "pool" would work in a jupyter notebook for the example shown in the multiprocessing example.
Actual behavior: Using "pool" in the sampler does not converge and continues to run indefinitely. However, the example in the tutorial shows that it should finish running and run faster than the serial code.
What have you tried so far?: Modified the code, tried different notebooks, tried to use pool.close()
Minimal example:
from multiprocessing import Pool
with Pool() as pool:
sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, pool=pool)
start = time.time()
sampler.run_mcmc(initial, nsteps, progress=True)
end = time.time()
multi_time = end - start
print("Multiprocessing took {0:.1f} seconds".format(multi_time))
print("{0:.1f} times faster than serial".format(serial_time / multi_time))
Thanks - are you on a mac? What version of Python?
The default settings for how multiprocessing on macOS have changed in Python 3.8 so that could be the cause. You could try running
import multiprocessing as mp
mp.set_start_method('fork')
before executing the above code to see if that works.
That worked! I am working on a mac with Python 3.8 so that must have been the issue.
Thank you very much!
Follow-up question: When trying to use "pool" in my own mcmc fitting function to increase the performance speed, I am running into problems with the sampler finishing, even for simple jobs i.e., low step size. I am running it within a jupyter notebook and once the sampler cell "finishes," the jupyter notebook becomes frozen and no new cells can be run. Is this something that can be fixed? This only happens when I use pool=Pool() in the EnsembleSampler code.
I'm not too sure. This probably has something to do with your specific model rather than emcee itself. The best bet is to try to make a simple example that reproduces this issue. You can try it without emcee by using something like:
coords = np.array.... # nwalkers, ndim
with Pool() as pool:
list(pool.map(log_prob_func, coords))
Hi, I was having a similar problem and this seems to have fixed the issue. I had to use the following line
mp.set_start_method('fork', force=True)
Have you considered updating the documentation?
I don't plan on updating the docs since this is just a fundamental issue with Python multiprocessing on Macs and others might have more intelligent fixes, but I could be convinced!