montepython_public
montepython_public copied to clipboard
slow because clik likelihoods loaded in sequence
Hi Thejs,
I have noticed that the clik likelihoods seem to be checked in sequence in MontePython. Is that so? When only running about 8 chains with the MCMC sampler this is still acceptable. However, when running e.g. with ntask=320
with PolyChord this checking of likelihoods takes about 2 hours and is done also every time when resuming from a previous run.
It prints out the following slowly ntask
times one after the other (before starting the actual sampling), which is why I get the impression it must be done in sequence. Compared to that CosmoChord prints out the checking of the clik likelihoods for all ntask
at once (within seconds).
/!\ PyFITS is deprecated, please use astropy.io.fits
----
clik version 6dc2a8cf3965
smica
Checking likelihood '/rds/user/lh561/hpc-work/data/PlanckData/plc-2.0/../plc-2.0/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik' on test data. got -380.979 expected -380.979 (diff -8.68346e-09)
----
BFLike Ntemp = 2876
BFLike Nq = 1407
BFLike Nu = 1407
BFLike Nside = 16
BFLike Nwrite = 32393560
cls file appears to have 5+ columns
assuming it is a CAMB file with l, TT, EE, BB, TE
info = 0
----
clik version 6dc2a8cf3965
bflike_smw
Checking likelihood '/rds/user/lh561/hpc-work/data/PlanckData/plc-2.0/../plc-2.0/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 1.77562e-07)
----
-
Is the checking of the clik likelihoods indeed done in sequence?
-
Where (which module, which function) in the code is this done?
-
Would it be complicated to change this such that the checks are done in parallel? Or would it be enough to do the checks just once for that matter?
Best, Lukas
Hi Lukas,
I imagine each chain starts by initialising the Planck likelihood when it calls it, which is probably the problem here. This likely happens in montepython/likelihood_class.py
in the class for clik type likelihoods (starting line 855):
class Likelihood_clik(Likelihood):
I'm not involved with Planck, so the finer details escape me. Perhaps our MontePython Planck people @dchooper or @lesgourg are able to provide some input?
Best, Thejs
Hi Lukas,
Indeed loading clik is very time consuming in montepython, I think this part of the code is not parallelised and it loads it for each chain individually. As Thejs said, the initialisation of the likelihood is done in the __init__
function in class Likelihood_clik(Likelihood)
, and MontePython always calls this function once per chain per run. I haven't yet found an optimum way to fix this without the code complaining somewhere else.
However, the Planck 2018 clik version is much faster. The gain in execution time is incredible, I really recommend switching to planck 2018 (which is available in MontePython version 3.2!).
Cheers, Deanna
Thanks @brinckmann and @dchooper,
However, the Planck 2018 clik version is much faster. The gain in execution time is incredible, I really recommend switching to planck 2018 (which is available in MontePython version 3.2!).
I'll give Planck 2018 a try, however, since PolyChord (PC) runs on hundreds of cores for heavy runs any initialisation of likelihoods done in sequence that seems negligible in time for MCMC runs might take a long time for PC runs (repeatedly for each resume).
the initialisation of the likelihood is done in the
__init__
function inclass Likelihood_clik(Likelihood)
, and MontePython always calls this function once per chain per run.
Calling the function once per chain (or in case of PC: once per task) per run would be fine if we could manage to have it done for each task in parallel.
Would it simplify things to try and do this only for use with -m PC
or only for mpirun
? More concretely might it be enough to modify montepython/PolyChord.py
or mpi_run
in montepython/run.py
? (since PC only makes sense in combination with mpirun
and only for PC (maybe also MultiNest?) does the likelihood initialisation time get as big as that...)