emcee Multiprocessing Tutorial Example Not Working

General information:

emcee version: 3.0.2
platform: python, jupyter notebook
installation method (pip/conda/source/other?): pip

Problem description:

Expected behavior: That the multiprocessing example using "pool" would work in a jupyter notebook for the example shown in the multiprocessing example.

Actual behavior: Using "pool" in the sampler does not converge and continues to run indefinitely. However, the example in the tutorial shows that it should finish running and run faster than the serial code.

What have you tried so far?: Modified the code, tried different notebooks, tried to use pool.close()

Minimal example:

from multiprocessing import Pool

with Pool() as pool:
    sampler = emcee.EnsembleSampler(nwalkers, ndim, log_prob, pool=pool)
    start = time.time()
    sampler.run_mcmc(initial, nsteps, progress=True)
    end = time.time()
    multi_time = end - start
    print("Multiprocessing took {0:.1f} seconds".format(multi_time))
    print("{0:.1f} times faster than serial".format(serial_time / multi_time))

Nov 10 '20 19:11 wynnjacobson-galan

Thanks - are you on a mac? What version of Python?

The default settings for how multiprocessing on macOS have changed in Python 3.8 so that could be the cause. You could try running

import multiprocessing as mp
mp.set_start_method('fork')

before executing the above code to see if that works.

Nov 10 '20 19:11 dfm

That worked! I am working on a mac with Python 3.8 so that must have been the issue.

Thank you very much!

Nov 10 '20 19:11 wynnjacobson-galan

Follow-up question: When trying to use "pool" in my own mcmc fitting function to increase the performance speed, I am running into problems with the sampler finishing, even for simple jobs i.e., low step size. I am running it within a jupyter notebook and once the sampler cell "finishes," the jupyter notebook becomes frozen and no new cells can be run. Is this something that can be fixed? This only happens when I use pool=Pool() in the EnsembleSampler code.

Nov 11 '20 17:11 wynnjacobson-galan

I'm not too sure. This probably has something to do with your specific model rather than emcee itself. The best bet is to try to make a simple example that reproduces this issue. You can try it without emcee by using something like:

coords = np.array.... # nwalkers, ndim
with Pool() as pool:
    list(pool.map(log_prob_func, coords))

Nov 11 '20 17:11 dfm

Hi, I was having a similar problem and this seems to have fixed the issue. I had to use the following line

mp.set_start_method('fork', force=True)

Have you considered updating the documentation?

May 20 '22 15:05 jorgenorena

I don't plan on updating the docs since this is just a fundamental issue with Python multiprocessing on Macs and others might have more intelligent fixes, but I could be convinced!

May 20 '22 15:05 dfm