pints
pints copied to clipboard
Compute pointwise log-likelihood for each observation
ArviZ provides a simple API to compute the LOO or WAIC for performance assessment of models, see https://arviz-devs.github.io/arviz/api/generated/arviz.waic.html.
What this would require however is the pointwise log-likelihood scores of the parameters in a chain for each observation. So for N obervations and M iterations and K chains, we would need to store NMK log-pdf values.
The computationally most efficient way to generate the pointwise log-likelihoods would potentially be to store the while running the chain before summing them up across observations. That would require some changes in our pints.LogPDF, pints.LogPosterior and the pints.MCMCSampler / pints.MCMCController though.
Alternatively, we could consider to implement a routine that takes the LogPDF of a problem and the chains and then computes the log-pdfs for the observations again. This would still require us to implement an additional method for the LogPDFs which returns the pointwise log-pdfs.
Discussed in meeting today:
- Pointwise log-likelihood = log likelihood of every point in a ProblemLogLikelihood, before summing
- Logical entry to add this in would be somewhere in ProblemLogLikelihood ?
I've actually realised that Stan doesn't save a point-wise log-likelihood as it runs. Instead, it computes it afterwards using each posterior sample. I think, however, that we should probably try to improve on this since our models are generally more expensive to run.
I've been looking into how to do this and this is my idea:
It requires changing the __call__
function of each ProblemLogLikelihood and adding 2 new functions so that:
def __call__(self, x):
pointwise = self.create_pointwise_loglikelihoods(x)
self._last_pointwise_loglikelihoods = pointwise
return np.sum(pointwise)
def create_pointwise_loglikelihoods(self, parameters):
"""
Returns a matrix of size nt x no containing the log likelihood of each observation and at each time point
with the given parameters
"""
def get_last_pointwise_loglikelihoods(self):
return self._last_pointwise_loglikelihoods
This allows there to be not much change to code already written but if you want to get the pointwise log likelihoods using the ask and tell interface you use get_last_pointwise_loglikelihoods
at each step without doing the calculations again. I believe this will also work with using the LogPosterior or similar for the telling. You can also choose to do it the stan way as well if you need to, using the create_pointwise_loglikelihoods
.
I think this looks really good and fits very nicely into the pints interface @Rebecca-Rumney !
A little bit unrelated to the API, I am wondering whether it is actually a good idea to store the pointwise log-pdfs always, as for large autocorrelations we may want to throw out a majority of the samples and the memory requirements can be quite large for larger datasets (so we might not actually save the energy needed for the computation as we need it for storage). So it's probably good to be able to switch storing of the pointwise log-pdfs off if we want. But I guess that will be a switch in the MCMCController?
@DavAug That's a good point. What I've written there only saves the last step's log-likelihoods (so of size N) rather than the whole N x M x K matrix and it is up to the user to store it somewhere. I'm personally not sure how large N is likely to get. If we have it as an option to turn on then it may make it harder to access if we are only calling for the posterior. We would then have to alter LogPosterior and anything else that calls the log posterior to have an option of saving the pointwise likelihoods.
I agree, I like the likelihood as you proposed! I guess I was more wondering whether we would want to store the 3 dimensional tensor in the MCMC controller in general, or maybe don’t store it by default and allow to switch it on. But maybe this is a question for another ticket :D
Get Outlook for iOShttps://aka.ms/o0ukef
From: Rebecca-Rumney [email protected] Sent: Friday, March 5, 2021 8:10:02 PM To: pints-team/pints [email protected] Cc: David Augustin [email protected]; Mention [email protected] Subject: Re: [pints-team/pints] Compute pointwise log-likelihood for each observation (#1300)
@DavAughttps://github.com/DavAug That's a good point. What I've written there only saves the last step's log-likelihoods (so of size N) rather than the whole N x M x K matrix and it is up to the user to store it somewhere. I'm personally not sure how large N would likely get. If we have it as an option to turn on then it may make it harder to access if we are only calling for the posterior, we would then have to alter LogPosterior and anything else that calls the log posterior to have an option of saving the pointwise likelihoods.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pints-team/pints/issues/1300#issuecomment-791622668, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEY2T3XPIUVZMPZLOSZLUOLTCEUAVANCNFSM4X66FFBQ.
Good start! But probably it'd be more efficient to have __call__
just assume you don't want to save, and have some alternative method like evaluateS1
that can be called if you really want to store each sample?
(I imagine there's some loss of performance if we do this by default, but we might want to benchmark that)