thejoker icon indicating copy to clipboard operation
thejoker copied to clipboard

Theano compilelock issues with MultiPool

Open adrn opened this issue 4 years ago • 7 comments

See email by Song Wang:

with schwimmbad.MultiPool() as pool:
        joker_mcmc = tj.TheJoker(prior_mcmc, pool=pool, random_state=rnd)
        mcmc_init = joker_mcmc.setup_mcmc(data, samples)

with schwimmbad.MultiPool() as pool:
       joker = tj.TheJoker(prior, pool=pool, random_state=rnd)
       prior_samples = prior.sample(size=10000,random_state=rnd)
       samples = joker.rejection_sample(data, prior_samples, max_posterior_samples=256)

throw a warning:

"INFO (theano.gof.compilelock): Waiting for existing lock by process '' (I am process '') INFO (theano.gof.compilelock): To manually release the lock, delete ***/lock_dir"

adrn avatar Jul 15 '20 11:07 adrn

Perhaps you already know this, but I normally use a hack to set the compiledir using os.pid. It's possible that this could also be handled by pre-compiling the required theano functions and then passing those around.

dfm avatar Jul 15 '20 15:07 dfm

Oh right (and for context, this was an email I got from a user). Do you have an example you could share?

adrn avatar Jul 15 '20 18:07 adrn

Something like the following can work:

import os
from multiprocessing import Pool

os.environ["THEANO_FLAGS"] = f"compiledir={os.getpid()}"

import theano
import theano.tensor as tt


def func(x):
    x_ = tt.dscalar()
    return theano.function([x_], [x_ * x_])(x)


if __name__ == "__main__":
    with Pool(4) as pool:
        print(list(pool.map(func, range(10))))

dfm avatar Jul 15 '20 19:07 dfm

Or...

from multiprocessing import Pool
import theano
import theano.tensor as tt


if __name__ == "__main__":
    x_ = tt.dscalar()
    func = theano.function([x_], [x_ * x_])
    with Pool(4) as pool:
        print(list(pool.map(func, range(10))))

dfm avatar Jul 15 '20 19:07 dfm

Thanks @adrn @dfm

When I add this "os.environ" line to my script, the warning stops keep brushing the screen, and just appears for fixed times (equal to how many processes set in the Pool). However, the code appears to be at a standstill, although the CPU is running. I wait for more than 20 minutes, neither of the processes completes the rejection sampling part. It seems needs quite long time to move to the next step. Still not in parallel?

Another strange case is when I set processes equal to 2, the code can run, but it skips the mcmc part.

My computer has 10 cores, is it OK if I set processes equal to 4?

I also try to open two or three terminals to run a single-process code. It works, but do not save too much time. It seems that the different terminals are not totally in parallel.

AstroSong avatar Jul 17 '20 11:07 AstroSong

@astrosong Strange! Could you share a minimum working example script, and send the versions of schwimmbad & thejoker that you are using? What platform are you on? Thanks!

python -c "import schwimmbad; print(schwimmbad.__version__)"
python -c "import thejoker; print(thejoker.__version__)"

adrn avatar Jul 28 '20 13:07 adrn

@AstroSong Strange! Could you share a minimum working example script, and send the versions of schwimmbad & thejoker that you are using? What platform are you on? Thanks!

python -c "import schwimmbad; print(schwimmbad.__version__)"
python -c "import thejoker; print(thejoker.__version__)"

@adrn If I use the above second code from @dfm, it works without any warning. But if I use the first code, the warning is still there like follows,

INFO (theano.gof.compilelock): Waiting for existing lock by unknown process (I am process '37974')
INFO (theano.gof.compilelock): Waiting for existing lock by unknown process (I am process '37975')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/song/k2_4/joker/test/37878/lock_dir
INFO (theano.gof.compilelock): To manually release the lock, delete /home/song/k2_4/joker/test/37878/lock_dir
INFO (theano.gof.compilelock): Waiting for existing lock by unknown process (I am process '37974')
INFO (theano.gof.compilelock): To manually release the lock, delete /home/song/k2_4/joker/test/37878/lock_dir

python -c "import schwimmbad; print(schwimmbad.version)" => 0.3.1 python -c "import thejoker; print(thejoker.version)" => 1.1

In addition, I updated joker before by using the git+https://github.com/adrn/thejoker, but I got one mistake:

  File "/usr/local/python3/lib/python3.8/site-packages/thejoker/prior.py", line 320, in sample
    with random_state_context(random_state):
  File "/usr/local/python3/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/usr/local/python3/lib/python3.8/site-packages/thejoker/utils.py", line 299, in random_state_context
    np.random.seed(integers(random_state, 2**32-1))  # HACK
  File "/usr/local/python3/lib/python3.8/site-packages/thejoker/utils.py", line 30, in <lambda>
    integers = lambda obj, *args, **kwargs: obj.integers(*args, **kwargs)
AttributeError: 'numpy.random.mtrand.RandomState' object has no attribute 'integers'

My numpy version is 1.19.0.

AstroSong avatar Jul 29 '20 08:07 AstroSong