BayesianTools
BayesianTools copied to clipboard
Parallelization of nrChains command
Currently, only internal calculations of the MCMC samplers use parallelization, the nrChains call is not automatically parallelized.
The current workaround for a slow model, running 3 chains is
- start 3 independent R instances
- if using a sampler like as DEzs that can use parallelization, turn on parallelization in each R instance (see help, but, e.g. DEzs in standard settings can make use of 3 cores per sampler)
- After all 3 samplers have finished, save results, read them in, and combine them with createMcmcSamplerList()
However, if parallel = T, we could also add the option to run multiple chains in parallel, instead of the internal parallelization. This may be interesting in some situations
Alternatively, we could try to maintain both the internal and between-chain parallelization, but needs some thinking about how to do this
See also http://www.win-vector.com/blog/2016/08/can-you-nest-parallel-operations-in-r/
This is of high interest to me (my model runs are quite expensive)!
Because our samplers also do internal parallelization, parallelizing the chains as well requires some changes to the code structure, so this won’t happen in the next months.
So far, the best solution is to do the workaround (see above), starting 3 independent R instances
That's exactly what I'm doing. Thanks again for your feedback!
Tankred, can you have a look at this and not implement, but just think about whether we can do this in a stable, non too-hacky way? If this doesn't seem save, I'd prefer it to let users do this by hand.
I guess the problem is that we have currently attached the parallelization to the BayesianSetup, which also seems important in some way, because only this ensures that people with heavy models have full control of hd io things. I'm not sure if this could be used by the runMCMC to parallelize the chains, I guess not.
So we would have to create a second parallelization option in runMCMC ... could do that, but doesn't feel stable / save.
Maybe we need to rethink the entire parallelization concept. In any case, at the moment, I just want to brainstorm about what the options are.