arviz
arviz copied to clipboard
Additional LOO functionality
There are several useful functions that can be used for model assessment and comparison that could be added here. In some cases these are already available in loo or soon will be:
- [ ]
loo_expectation(data, values, kwargs...)
: compute expectation ofvalues
wrt the leave-one-out posteriors using PSIS. e.g. ifvalues
are likelihoods, this computes pointwise ELPD, though less efficiently and stably thanloo
does. Ifvalues
are posterior predictions, then the result is leave-one-out posterior predictive means. Exists in loo asE_loo
and was previously proposed to be added to ArviZ in https://github.com/arviz-devs/arviz/issues/1931#issuecomment-1021834871. - [ ]
logo(data, groups, var_name, kwargs...)
: compute leave-one-group-out crossvalidation.groups
is an array of the same shape as the log-likelihood corresponding to the var specified byvar_name
, minus sample dimensions. The unique group values define new dimensions, and log-likelihoods that share the same group are summed together. The docstring should warn that this is more likely to produce Pareto shape errors than LOO, and it will probably fail if there are group-specific parameters. - [ ]
loo_predictive_error(data, method, kwargs...)
: useloo_expectation
to compute point estimates of LOO posterior-predictive means, and then use the name or functionmethod
to compute pointwise and mean predictive error across the data points. Coming to loo: https://github.com/stan-dev/loo/pull/202 - [ ]
loo_crps(idata, scale=False, kwargs...)
: useloo_expectation
to compute continuous ranked probability score (CRPS) or it's scaled variant SCRPS, which is another strictly proper scoring rule besides Log score (ELPD) to use for model comparison. Requires 2 posterior predictive draws for each posterior draw, so blocked by https://github.com/arviz-devs/arviz/issues/2048. Coming to loo: https://github.com/stan-dev/loo/pull/203
Generally agree. Not so sure about the logo
one. It would basically be
- groupby using xarray
- call
az.loo
I personally think it would be better to document this and have users do the groupby themselves instead of adding a logo function. If we add a function, I expect users to assume only cv schemes directly compatible with loo or logo are supported, but they can also do any combination themselves and I don't think we can support all possible cases, a draft example of already using logo with loo would be https://nbviewer.org/github/OriolAbril/Exploratory-Analysis-of-Bayesian-Models/blob/multi_obs_ic/content/Section_04/Multiple_likelihoods.ipynb#Predicting-team-level-performance.
Good point about logo
. I agree, if we can document it, and the code only takes a few lines, then it doesn't make sense to add a logo
implementation for this. Thanks for the example; should be straightforward to adapt.
Hi @sethaxen , I was willing to implement the loo_expectation
function in arviz.stats
. From what I understood, we could implement it by calculating the weights using az.loo
and then use the weights to calculate the weighted expectation, something like (values * weights).sum()/weights.sum()
. I just wanted to know if I'm on the right track?
That would be one approach. loo
both computes the log-weights using psislw
and then estimates the ELPD. Technically this function only needs the log-weights from loo
, so it would be more efficient to just call psislw
as loo
does.
something like
(values * weights).sum()/weights.sum()
.
Yep, this is the right approach, with a few notes:
- The log-weights returned by
psislw
are already normalized, so no need forweights.sum()
. - You probably want to avoid numerical issues that would happen when
exp(log_weights[i])
will underflow to 0 butexp(log(abs(values[i])) + log_weights[i])
might actually be large. Something like(np.sign(values) * np.exp(np.log(np.abs(values)) + log_weights)).sum()
might be the right approach. - Third, the PSIS paper in Section E describes an approach for diagnosing an expectation computed using importance weights, described below.
Namely, in addition to using psis(-log_likelihood)
to get the LOO log-weights, you would also compute something like
summand = np.sign(values) * np.exp(np.log(np.abs(values)) - log_likelihood))
log_abs_summand = np.abs(values) - log_likelihood
max_log_abs_summand = log_abs_summand.max()
summand_scaled = np.sign(values) * np.exp(max_log_abs_summand - max_log_abs_summand)
tail_right = ... # select right tail of summand_scaled using approach of https://github.com/arviz-devs/arviz/blob/6b1b2be60cca804d4b0ec43a3303820bfa8785c6/arviz/stats/stats.py#L974-L980
tail_left = ... # do the same but for `-summand_scaled`
k_hat_right, _ = _gpdfit(tail_right)
k_hat_left, _ = _gpdfit(tail_left)
k_hat = max(k_hat_right, k_hat_left)
It might be better to refactor _psislw
slightly to reduce code reuse. k_hat
should at least be used to warn if an expectation will be poorly estimated, but a user may optionally want it returned. Some reworking is probably needed if you want to compute expectations with respect to multiple functions simultaneously. (IIRC, this is effectively what loo_pit
does and needs)
@avehtari does this look right?
FWIW, I would support adding loo_expectation
in one PR and then adding the diagnostics in a later PR.
Corresponding R functions are in https://github.com/stan-dev/loo/blob/master/R/E_loo.R, see specifically .wmean
, .wvar
, .wsd
, .wquant
. loo
package is storing log weights, but in these computations the normalized weights are computed and used in .w*
functions.
For the diagnostics, loo
package has now also minimum sample size and sample size dependent khat-threshold. The new diagnostic summary messages are in a PR https://github.com/n-kall/posterior/blob/9a59fb57fbd7947a06eab908b3b2209fc60a2146/R/pareto_smooth.R#L611, and will be merged before the next release.
@sethaxen Thanks for the quick reply :+1: . I'll keep these points in mind and try to submit a draft PR implementing the loo_expectation
function, you can review it whenever you have time.