CalibrateEmulateSample.jl
CalibrateEmulateSample.jl copied to clipboard
Multithreading in Emulation/MCMC
Issue
Easy gains in MCMC, by using multithreading within each step (and calling e.g. julia --project -t 8 script.jl
) . For GP (and scalar RF) implementations,
- the prediction runs a loop over the scalar-valued models.
- the training stage also runs a loop over the scalar-valued models. (Here it may require extra memory management)
Suggestion
-
For MCMC, add the decorator
Threads.@threads for i=1:M
to the loop https://github.com/CliMA/CalibrateEmulateSample.jl/blob/bf3df405753033e852b91c19d5cb11470dfdc91f/src/GaussianProcess.jl#L197-L199 This will increase speed of prediction within MCMC by e.g. 8x -
For decorrelated problems, (i.e. GP and scalar RF) one can similarly train the models with such loop decorations. This will increase the speed of training by e.g. 8x
Preliminarily from @szy21 we see that 8 threads gives only 2x speed-up to sampling in the EDMF example, I'll continue the investigation with other examples.
Oftentimes, the downstream dependencies will greedily harness all available threads, thus calling with -t 8
and not putting in any code changes (e.g. dont add the Threads.@threads) often gives significant speedup.