montepython_public icon indicating copy to clipboard operation
montepython_public copied to clipboard

slow because clik likelihoods loaded in sequence

Open lukashergt opened this issue 5 years ago • 3 comments

Hi Thejs,

I have noticed that the clik likelihoods seem to be checked in sequence in MontePython. Is that so? When only running about 8 chains with the MCMC sampler this is still acceptable. However, when running e.g. with ntask=320 with PolyChord this checking of likelihoods takes about 2 hours and is done also every time when resuming from a previous run.

It prints out the following slowly ntask times one after the other (before starting the actual sampling), which is why I get the impression it must be done in sequence. Compared to that CosmoChord prints out the checking of the clik likelihoods for all ntask at once (within seconds).

 /!\ PyFITS is deprecated, please use astropy.io.fits
----
clik version 6dc2a8cf3965
  smica
Checking likelihood '/rds/user/lh561/hpc-work/data/PlanckData/plc-2.0/../plc-2.0/hi_l/plik/plik_dx11dr2_HM_v18_TT.clik' on test data. got -380.979 expected -380.979 (diff -8.68346e-09)
----
 BFLike Ntemp  =        2876
 BFLike Nq     =        1407
 BFLike Nu     =        1407
 BFLike Nside  =          16
 BFLike Nwrite =    32393560
 cls file appears to have 5+ columns
 assuming it is a CAMB file with l, TT, EE, BB, TE
 info =            0
----
clik version 6dc2a8cf3965
  bflike_smw
Checking likelihood '/rds/user/lh561/hpc-work/data/PlanckData/plc-2.0/../plc-2.0/low_l/bflike/lowl_SMW_70_dx11d_2014_10_03_v5c_Ap.clik' on test data. got -5247.87 expected -5247.87 (diff 1.77562e-07)
----
  • Is the checking of the clik likelihoods indeed done in sequence?

  • Where (which module, which function) in the code is this done?

  • Would it be complicated to change this such that the checks are done in parallel? Or would it be enough to do the checks just once for that matter?

Best, Lukas

lukashergt avatar Aug 05 '19 15:08 lukashergt

Hi Lukas,

I imagine each chain starts by initialising the Planck likelihood when it calls it, which is probably the problem here. This likely happens in montepython/likelihood_class.py in the class for clik type likelihoods (starting line 855): class Likelihood_clik(Likelihood):

I'm not involved with Planck, so the finer details escape me. Perhaps our MontePython Planck people @dchooper or @lesgourg are able to provide some input?

Best, Thejs

brinckmann avatar Aug 22 '19 03:08 brinckmann

Hi Lukas,

Indeed loading clik is very time consuming in montepython, I think this part of the code is not parallelised and it loads it for each chain individually. As Thejs said, the initialisation of the likelihood is done in the __init__ function in class Likelihood_clik(Likelihood), and MontePython always calls this function once per chain per run. I haven't yet found an optimum way to fix this without the code complaining somewhere else.

However, the Planck 2018 clik version is much faster. The gain in execution time is incredible, I really recommend switching to planck 2018 (which is available in MontePython version 3.2!).

Cheers, Deanna

dchooper avatar Aug 22 '19 07:08 dchooper

Thanks @brinckmann and @dchooper,

However, the Planck 2018 clik version is much faster. The gain in execution time is incredible, I really recommend switching to planck 2018 (which is available in MontePython version 3.2!).

I'll give Planck 2018 a try, however, since PolyChord (PC) runs on hundreds of cores for heavy runs any initialisation of likelihoods done in sequence that seems negligible in time for MCMC runs might take a long time for PC runs (repeatedly for each resume).

the initialisation of the likelihood is done in the __init__ function in class Likelihood_clik(Likelihood), and MontePython always calls this function once per chain per run.

Calling the function once per chain (or in case of PC: once per task) per run would be fine if we could manage to have it done for each task in parallel.

Would it simplify things to try and do this only for use with -m PC or only for mpirun? More concretely might it be enough to modify montepython/PolyChord.py or mpi_run in montepython/run.py? (since PC only makes sense in combination with mpirun and only for PC (maybe also MultiNest?) does the likelihood initialisation time get as big as that...)

lukashergt avatar Aug 29 '19 13:08 lukashergt