[WIP]

Missing testing and getdist side of things.

May 20 '21 21:05 JesusTorrado

Codecov Report

Merging #180 (8e8a5f4) into master (ac3ed01) will decrease coverage by 0.04%. The diff coverage is 75.40%.

@@            Coverage Diff             @@
##           master     #180      +/-   ##
==========================================
- Coverage   87.59%   87.55%   -0.05%     
==========================================
  Files          91       91              
  Lines        8345     8390      +45     
==========================================
+ Hits         7310     7346      +36     
- Misses       1035     1044       +9

Impacted Files	Coverage Δ
cobaya/theories/camb/camb.py	`92.00% <ø> (ø)`
cobaya/collection.py	`85.39% <68.29%> (-1.54%)`	:arrow_down:
cobaya/samplers/mcmc/mcmc.py	`91.16% <87.50%> (-0.31%)`	:arrow_down:
cobaya/model.py	`93.92% <100.00%> (+0.03%)`	:arrow_up:
cobaya/sampler.py	`89.36% <0.00%> (+0.42%)`	:arrow_up:
cobaya/prior.py	`97.87% <0.00%> (+1.41%)`	:arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Sep 08 '21 16:09 codecov-commenter

@cmbant Mostly done. As expected by getdist, now storing tempered weights (integer) and logpost in the collection. Statistical functions such as mean, cov take into account the temperature with which the sample was generated and return the statistics corresponding to the original pdf (unless otherwise requested with ignore_temperature=True, as convergence checks in tempered MCMC do).

A couple of missing things:

Collection.thin_samples: should it care about temperature? I reckon it should not at this point in the code, right? (that's the point of integer weights?) But not sure.
Handling of covmats I/O: I would say we would load/save the temperature=1 covmats, and modify them internally, since we may want to reuse them in sampling processes with different temperatures?
What to do exactly about getdist: now one can load a tempered sample, call MCSamples.cool() with the temperature, and recover the original pdf. But since the temperature is known at the time of loading the sample (stored in the updated.yaml), wouldn't it make sense that getdist is smater about it, e.g. having a default value for the cool argument in MCSamples.cool(), or even calling it automatically before plotting, even if weights are still stored tempered/integer for chain operations purposes?

Sep 08 '21 16:09 JesusTorrado

back from hols now. I agree thinning should just use the integer weights, and saving covmats with T=1 makes sense. Could certainly add some temperature meta data tag to .properties.ini and yaml equivalent, though people may also want to look at the high-temperature results (e.g. to visually check tail convergence, or see what results look like with effectively larger errors).

Sep 17 '21 07:09 cmbant

Thanks!

Then I don't think there is anything to be done then on the thinning side, since the relevant auxiliary GetDist chains are already initialised with weights and logposterior of the tempered posterior (so integer weights and exponentiated minuslogposterior).
CovMat I/O is now done for the T=1 pdf.
About GetDist:
- The question is about default behaviour: when we plot a tempered chain, do we want to get plots/statistics for the original pdf (maybe together with a warning) if no kwarg is overriden, or the other way around? I think it should be the first one, but your call. Of course, ideally in both cases one should be able to get plots/statistics of both the tempered and original pdf.
- No need for now for metadata: the temperature is in the updated.yaml file, and I have already wrote a function for GetDist to extract it (just not sure where to put it, because GetDist has a few different ways/places for loading chains). As stated above, once set as an attribute of the chain, the temperature could be the default value for the arg of MCSamples.cool().

I'll open a GetDist issue about this.

Sep 22 '21 13:09 JesusTorrado

@cmbant do you agree with the changes in the docs here, or have anything to add? https://github.com/CobayaSampler/cobaya/pull/180/commits/89b31887357ea132a09e458441869c781cceb065

Sep 22 '21 15:09 JesusTorrado

Looks good to me thanks, though not very sure what you mean by "by weighting with samples with their probability". Can be useful for evaluating small tail probabilities. In high dimensions I'm using a single higher temperature usually probably doesn't help much for importance sampling (you'd really want to flatten just in a small number of relevant directions).

Did you fix the issue with the covariance scaling?

I don't have a very strong opinion about default GetDist behaviour. Ideally temperature, cooling, thinning , and burn-in removal metadata should probably be propagated consistently (including when saving MCSamples.saveAsText, and in Cobaya importance-sampled outputs).

Sep 22 '21 16:09 cmbant

Thanks a lot for the review!

Looks good to me thanks, though not very sure what you mean by "by weighting with samples with their probability".

The first "with" should not be there, sorry. I mean when computing any quantity for which one weights samples with their posterior/prior/likelihood, e.g. in the GP sampler we try to do quick estimates of the covariance matrix from small high-temperature samples, too small to be fair, but sparse enough to get a decent estimation (hopefully).

Can be useful for evaluating small tail probabilities.

I'll add that.

In high dimensions I'm using a single higher temperature usually probably doesn't help much for importance sampling (you'd really want to flatten just in a small number of relevant directions).

Ok, I guess I was thinking too naively about this. Do you think it's still worth mentioning that it may be useful in low dimensions, or should we remove any mention of re-weighting there?

Did you fix the issue with the covariance scaling?

Yes, it was indeed the prior, thanks!

I don't have a very strong opinion about default GetDist behaviour.

My strongest opinion is that, if the default is plotting/using the high-temperature chain, users should get at least a warning when loading the chain with an explanation of how to cool it down. Another strong opinion: some hint/checkbox in the GUI for cooling down on the spot (even it the cooled-down chain only stays in RAM). Otherwise users would have to load the chains by hand in a Python shell/notebook and cool them to be able to plot them? Not sure how you did that before with CosmoMC.

In general, I think GetDist should work with it more like Cobaya's collections: stored as high-temperature, but cooled-down weights and posterior can be requested via methods without modifying the internal high-temperature data. This way you don't need to cool down the sample to get "cool" statistics and plots.

Ideally temperature, cooling, thinning , and burn-in removal metadata should probably be propagated consistently (including when saving MCSamples.saveAsText, and in Cobaya importance-sampled outputs).

I'll turn this into another issue and get to work on it over the following weeks.

As for post processing, the easiest way to go at the moment is to automatically cool on load, and thus not preserve temperature. Unless you strongly disagree, I'd leave it like that for now and open another issue to be worked on in the following weeks. (I need this one merged soon).

Sep 22 '21 19:09 JesusTorrado

I started support for auto-cool in getdist at https://github.com/cmbant/getdist/tree/autocool. Cobaya temperature setting could be propagated to properties as other metadata settings like sampler. Does this work?

GetDist gui currently doesn't have a good way to change settings per chain (since you often operate on many at once), but this I think works analogously to ignore_rows/burn_removed, in that set globally, but chain metadata specified whether should be applied to each specific chain.

Sep 23 '21 07:09 cmbant

Thanks! Could work. Since this is new, we can test it ourselves and see whether it feels convenient. I am going to be using it this coming week myself.

To get the temperature from a yaml, assuming you have got the sampler name with the get_sampler function that is already there, you can use:

def get_sample_temperature(filename_or_info, sampler):
    return yaml_file_or_dict(filename_or_info).get(_sampler, {}).get(sampler, {})\
        .get("temperature", 1)

So if I got it correctly, GetDistGUI will cool down on load unless the metadata requests otherwise. That would look good to me then.

Any comment on my answer above re documentation?

Sep 25 '21 20:09 JesusTorrado

Doc changes sound OK, can certainly mention could be useful for importance sampling in some cases.

Sep 27 '21 07:09 cmbant

TODO (at least): post processing, how to store temperature/cool status (https://github.com/CobayaSampler/cobaya/issues/202), getdist setting of temperature property on cobaya load.

I pushed a few of the tempering-independent fixes in this PR to master.

Feb 24 '22 11:02 cmbant

@JesusTorrado: @AndreasNygaard is interested in using temperatures for training Connect - how far off do you think this is from merge? (I re-activated travis account again..)

Oct 06 '22 17:10 cmbant

cobaya
cobaya copied to clipboard

Tempered mcmc

Codecov Report

cobaya cobaya copied to clipboard

Tempered mcmc

Codecov Report

cobaya
cobaya copied to clipboard