emcee different results when using multiprocessing Pool() (sometimes)

General information:

emcee version: 2.2.1
platform: Windows
installation method (pip/conda/source/other?): pip

Problem description: I was doing some sanity checks with the line.py example and run into different results whether or not I used multiprocessing Pool() when initializing EnsembleSampler (or PTSampler for that matter). Under other circumstances the results are in fact the same.

Minimal example:

These are the modification to line.py I made:

from multiprocessing import Pool
### same code
# widen the parameter ranges
def lnprior(theta):
    m, b, lnf = theta
    if -50.0 < m < 50 and -50.0 < b < 50 and -50.0 < lnf < 50:
        return 0.0
    return -np.inf
### same code
# less optimal initial positions and sample more times
pos = [result["x"] + 5*np.random.randn(ndim) for i in range(nwalkers)]
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob, args=(x, y, yerr), pool = None) # or pool = Pool()
sampler.run_mcmc(pos, 1500, rstate0=np.random.get_state())

A rough example based on the code above.

Aug 04 '20 09:08 dalbabur

Thanks for reporting.

Are you setting your seed to be the same? In other words, are you sure that this is really being caused by the pool argument?

If you upgrade to emcee>=3 it will provide identical results (as long as you set your seed). I would expect that to be true with earlier versions, but it wasn't carefully tested.

Aug 04 '20 12:08 dfm

That could be it. Right now the seed is set after the imports and the behavior is the same across runs. How should I be setting the seed across process?

Aug 04 '20 22:08 dalbabur