cobaya
cobaya copied to clipboard
Tempered mcmc
[WIP]
Missing testing and getdist side of things.
Codecov Report
Merging #180 (8e8a5f4) into master (ac3ed01) will decrease coverage by
0.04%
. The diff coverage is75.40%
.
@@ Coverage Diff @@
## master #180 +/- ##
==========================================
- Coverage 87.59% 87.55% -0.05%
==========================================
Files 91 91
Lines 8345 8390 +45
==========================================
+ Hits 7310 7346 +36
- Misses 1035 1044 +9
Impacted Files | Coverage Δ | |
---|---|---|
cobaya/theories/camb/camb.py | 92.00% <ø> (ø) |
|
cobaya/collection.py | 85.39% <68.29%> (-1.54%) |
:arrow_down: |
cobaya/samplers/mcmc/mcmc.py | 91.16% <87.50%> (-0.31%) |
:arrow_down: |
cobaya/model.py | 93.92% <100.00%> (+0.03%) |
:arrow_up: |
cobaya/sampler.py | 89.36% <0.00%> (+0.42%) |
:arrow_up: |
cobaya/prior.py | 97.87% <0.00%> (+1.41%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@cmbant Mostly done. As expected by getdist, now storing tempered weights (integer) and logpost in the collection. Statistical functions such as mean
, cov
take into account the temperature with which the sample was generated and return the statistics corresponding to the original pdf (unless otherwise requested with ignore_temperature=True
, as convergence checks in tempered MCMC do).
A couple of missing things:
-
Collection.thin_samples
: should it care about temperature? I reckon it should not at this point in the code, right? (that's the point of integer weights?) But not sure. - Handling of covmats I/O: I would say we would load/save the temperature=1 covmats, and modify them internally, since we may want to reuse them in sampling processes with different temperatures?
- What to do exactly about getdist: now one can load a tempered sample, call
MCSamples.cool()
with the temperature, and recover the original pdf. But since the temperature is known at the time of loading the sample (stored in the updated.yaml), wouldn't it make sense that getdist is smater about it, e.g. having a default value for thecool
argument inMCSamples.cool()
, or even calling it automatically before plotting, even if weights are still stored tempered/integer for chain operations purposes?
back from hols now. I agree thinning should just use the integer weights, and saving covmats with T=1 makes sense. Could certainly add some temperature meta data tag to .properties.ini and yaml equivalent, though people may also want to look at the high-temperature results (e.g. to visually check tail convergence, or see what results look like with effectively larger errors).
Thanks!
-
Then I don't think there is anything to be done then on the thinning side, since the relevant auxiliary GetDist chains are already initialised with weights and logposterior of the tempered posterior (so integer weights and exponentiated
minuslogposterior
). -
CovMat I/O is now done for the T=1 pdf.
-
About GetDist:
- The question is about default behaviour: when we plot a tempered chain, do we want to get plots/statistics for the original pdf (maybe together with a warning) if no kwarg is overriden, or the other way around? I think it should be the first one, but your call. Of course, ideally in both cases one should be able to get plots/statistics of both the tempered and original pdf.
- No need for now for metadata: the temperature is in the
updated.yaml
file, and I have already wrote a function for GetDist to extract it (just not sure where to put it, because GetDist has a few different ways/places for loading chains). As stated above, once set as an attribute of the chain, the temperature could be the default value for the arg ofMCSamples.cool()
.
I'll open a GetDist issue about this.
@cmbant do you agree with the changes in the docs here, or have anything to add? https://github.com/CobayaSampler/cobaya/pull/180/commits/89b31887357ea132a09e458441869c781cceb065
Looks good to me thanks, though not very sure what you mean by "by weighting with samples with their probability". Can be useful for evaluating small tail probabilities. In high dimensions I'm using a single higher temperature usually probably doesn't help much for importance sampling (you'd really want to flatten just in a small number of relevant directions).
Did you fix the issue with the covariance scaling?
I don't have a very strong opinion about default GetDist behaviour. Ideally temperature, cooling, thinning , and burn-in removal metadata should probably be propagated consistently (including when saving MCSamples.saveAsText, and in Cobaya importance-sampled outputs).
Thanks a lot for the review!
Looks good to me thanks, though not very sure what you mean by "by weighting with samples with their probability".
The first "with" should not be there, sorry. I mean when computing any quantity for which one weights samples with their posterior/prior/likelihood, e.g. in the GP sampler we try to do quick estimates of the covariance matrix from small high-temperature samples, too small to be fair, but sparse enough to get a decent estimation (hopefully).
Can be useful for evaluating small tail probabilities.
I'll add that.
In high dimensions I'm using a single higher temperature usually probably doesn't help much for importance sampling (you'd really want to flatten just in a small number of relevant directions).
Ok, I guess I was thinking too naively about this. Do you think it's still worth mentioning that it may be useful in low dimensions, or should we remove any mention of re-weighting there?
Did you fix the issue with the covariance scaling?
Yes, it was indeed the prior, thanks!
I don't have a very strong opinion about default GetDist behaviour.
My strongest opinion is that, if the default is plotting/using the high-temperature chain, users should get at least a warning when loading the chain with an explanation of how to cool it down. Another strong opinion: some hint/checkbox in the GUI for cooling down on the spot (even it the cooled-down chain only stays in RAM). Otherwise users would have to load the chains by hand in a Python shell/notebook and cool them to be able to plot them? Not sure how you did that before with CosmoMC.
In general, I think GetDist should work with it more like Cobaya's collections: stored as high-temperature, but cooled-down weights and posterior can be requested via methods without modifying the internal high-temperature data. This way you don't need to cool down the sample to get "cool" statistics and plots.
Ideally temperature, cooling, thinning , and burn-in removal metadata should probably be propagated consistently (including when saving MCSamples.saveAsText, and in Cobaya importance-sampled outputs).
I'll turn this into another issue and get to work on it over the following weeks.
As for post processing, the easiest way to go at the moment is to automatically cool on load, and thus not preserve temperature. Unless you strongly disagree, I'd leave it like that for now and open another issue to be worked on in the following weeks. (I need this one merged soon).
I started support for auto-cool in getdist at https://github.com/cmbant/getdist/tree/autocool. Cobaya temperature setting could be propagated to properties as other metadata settings like sampler. Does this work?
GetDist gui currently doesn't have a good way to change settings per chain (since you often operate on many at once), but this I think works analogously to ignore_rows/burn_removed, in that set globally, but chain metadata specified whether should be applied to each specific chain.
Thanks! Could work. Since this is new, we can test it ourselves and see whether it feels convenient. I am going to be using it this coming week myself.
To get the temperature from a yaml, assuming you have got the sampler name with the get_sampler
function that is already there, you can use:
def get_sample_temperature(filename_or_info, sampler):
return yaml_file_or_dict(filename_or_info).get(_sampler, {}).get(sampler, {})\
.get("temperature", 1)
So if I got it correctly, GetDistGUI will cool down on load unless the metadata requests otherwise. That would look good to me then.
Any comment on my answer above re documentation?
Doc changes sound OK, can certainly mention could be useful for importance sampling in some cases.
TODO (at least): post processing, how to store temperature/cool status (https://github.com/CobayaSampler/cobaya/issues/202), getdist setting of temperature property on cobaya load.
I pushed a few of the tempering-independent fixes in this PR to master.
@JesusTorrado: @AndreasNygaard is interested in using temperatures for training Connect - how far off do you think this is from merge? (I re-activated travis account again..)