emcee Correct way to calculate the model evidence

Correct way to calculate the model evidence

Open cloud182 opened this issue 4 years ago • 5 comments

Hi everyone,

I'm using an EnsembleSampler to fit a series of models to the data. For each model, I'm interested in an estimate of the best fitting parameters, but I'd also like to calculate the odds ratios for all the models in order to understand which one is better describing the data. To do so I need to recover the evidence, i.e. the integral of the un-normalized posterior. Is there a way to recover this information from the output of run_mcmc() when using an EnsembleSampler? I know that PTSampler can do it but I saw that it was moved to another package. Since I'm working on a pre-existing code, I'd prefer not to adapt it to use the new ptemcee package if there is a reasonable way to get these data using the current implementation of the algorithm.

Thanks in advance for the help,

Cheers, Enrico

Jul 10 '20 18:07 cloud182

Unfortunately this isn't possible with vanilla emcee. You could pin your requirements to emcee < 3.0, but there were always issues with the PTSampler so I'd probably recommend switching to something more up-to-date like dynesty if you need to do this!

Jul 10 '20 19:07 dfm

Hi, thanks for the fast reply!

I saw that it is possible to recover the value of the posterior at the end of the process for each point in the chains. Even if it is not possible to get the exact value for the integral, wouldn't taking the mean of these value to be an estimate of the integral, except for a multiplicative factor? I'm not an expert on this, I just want to be sure before committing to significantly change the code.

Thanks again, Enrico

Il giorno ven 10 lug 2020 alle ore 15:51 Dan Foreman-Mackey < [email protected]> ha scritto:

Unfortunately this isn't possible with vanilla emcee. You could pin your requirements to emcee < 3.0, but there were always issues with the PTSampler so I'd probably recommend switching to something more up-to-date like dynesty https://dynesty.readthedocs.io/ if you need to do this!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dfm/emcee/issues/348#issuecomment-656860976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI5DOBNZIPAXMWCLRPN5EPLR25WMBANCNFSM4OW2WU2A .

Jul 10 '20 20:07 cloud182

This will not work in general. If you look at the integral you are trying to do, and the integral that is implicitly being done by averaging over samples, you will find that they are very different integrals.

Modifying emcee to deliver a marginalized likelihood would be a big project. It's possible that you could make some iterated emcee method that would implement what's here https://arxiv.org/abs/1401.6128 but I would not recommend it.

Jul 12 '20 13:07 davidwhogg

OK, thank you very much for your help! I'll look into the dynesty package!

Cheers, Enrico

Il giorno dom 12 lug 2020 alle ore 09:13 David W. Hogg < [email protected]> ha scritto:

This will not work in general. If you look at the integral you are trying to do, and the integral that is implicitly being done by averaging over samples, you will find that they are very different integrals.

Modifying emcee to deliver a marginalized likelihood would be a big project. It's possible that you could make some iterated emcee method that would implement what's here https://arxiv.org/abs/1401.6128 but I would not recommend it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dfm/emcee/issues/348#issuecomment-657220121, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI5DOBK3IPRL3YOOJ4XVFXLR3GZF7ANCNFSM4OW2WU2A .

Jul 13 '20 13:07 cloud182

David: Sorry to open an old thread, and if the answer to my question is obvious (I will look at dynesty). Is there some intrinsic* problem with using the samples from emcee to calculate marginal likelihoods (and then Bayes factors)? I've adapted your line-fitting tutorial to produce data from one of two error models:

yerr^2 = sigma^2 (parameters: m, b, sigma)

yerr(i)^2 = sigma^2 + sigma_f(i)^2 w/ sigma_f(i) ~ N(0, f*y(i)) (parameters: m, b, sigma, f)

and then performed MCMC with each assumption. The marginal likelihoods (calculated, e.g., by the harmonic mean of the likelihood using "flat_samples" ... although I'd like eventually to use better methods) are reasonable at least in so far as the Bayes factors are large and in favor of using the model hypothesis from which the data was generated.

-- By intrinsic, I mean issues other than theoretical ones, like the need to specify a proper prior, and practical/numerical ones, like the difficulties of calculating marginal likelihood from samples ("harmonic mean estimator has infinite variance"). Are the samples from emcee not appropriate posterior samples for some reason?

May 08 '22 18:05 holderbp

emcee emcee copied to clipboard

Correct way to calculate the model evidence

emcee
emcee copied to clipboard