BayesianTools icon indicating copy to clipboard operation
BayesianTools copied to clipboard

Parallelization of nrChains command

Open florianhartig opened this issue 7 years ago • 4 comments

Currently, only internal calculations of the MCMC samplers use parallelization, the nrChains call is not automatically parallelized.

The current workaround for a slow model, running 3 chains is

  • start 3 independent R instances
  • if using a sampler like as DEzs that can use parallelization, turn on parallelization in each R instance (see help, but, e.g. DEzs in standard settings can make use of 3 cores per sampler)
  • After all 3 samplers have finished, save results, read them in, and combine them with createMcmcSamplerList()

However, if parallel = T, we could also add the option to run multiple chains in parallel, instead of the internal parallelization. This may be interesting in some situations

Alternatively, we could try to maintain both the internal and between-chain parallelization, but needs some thinking about how to do this

See also http://www.win-vector.com/blog/2016/08/can-you-nest-parallel-operations-in-r/

florianhartig avatar Dec 20 '16 09:12 florianhartig

This is of high interest to me (my model runs are quite expensive)!

dleutnant avatar May 10 '17 09:05 dleutnant

Because our samplers also do internal parallelization, parallelizing the chains as well requires some changes to the code structure, so this won’t happen in the next months.

So far, the best solution is to do the workaround (see above), starting 3 independent R instances

florianhartig avatar May 12 '17 14:05 florianhartig

That's exactly what I'm doing. Thanks again for your feedback!

dleutnant avatar May 12 '17 16:05 dleutnant

Tankred, can you have a look at this and not implement, but just think about whether we can do this in a stable, non too-hacky way? If this doesn't seem save, I'd prefer it to let users do this by hand.

I guess the problem is that we have currently attached the parallelization to the BayesianSetup, which also seems important in some way, because only this ensures that people with heavy models have full control of hd io things. I'm not sure if this could be used by the runMCMC to parallelize the chains, I guess not.

So we would have to create a second parallelization option in runMCMC ... could do that, but doesn't feel stable / save.

Maybe we need to rethink the entire parallelization concept. In any case, at the moment, I just want to brainstorm about what the options are.

florianhartig avatar Jan 10 '18 11:01 florianhartig