arviz Additional LOO functionality

There are several useful functions that can be used for model assessment and comparison that could be added here. In some cases these are already available in loo or soon will be:

[ ] loo_expectation(data, values, kwargs...): compute expectation of values wrt the leave-one-out posteriors using PSIS. e.g. if values are likelihoods, this computes pointwise ELPD, though less efficiently and stably than loo does. If values are posterior predictions, then the result is leave-one-out posterior predictive means. Exists in loo as E_loo and was previously proposed to be added to ArviZ in https://github.com/arviz-devs/arviz/issues/1931#issuecomment-1021834871.
[ ] logo(data, groups, var_name, kwargs...): compute leave-one-group-out crossvalidation. groups is an array of the same shape as the log-likelihood corresponding to the var specified by var_name, minus sample dimensions. The unique group values define new dimensions, and log-likelihoods that share the same group are summed together. The docstring should warn that this is more likely to produce Pareto shape errors than LOO, and it will probably fail if there are group-specific parameters.
[ ] loo_predictive_error(data, method, kwargs...): use loo_expectation to compute point estimates of LOO posterior-predictive means, and then use the name or function method to compute pointwise and mean predictive error across the data points. Coming to loo: https://github.com/stan-dev/loo/pull/202
[ ] loo_crps(idata, scale=False, kwargs...): use loo_expectation to compute continuous ranked probability score (CRPS) or it's scaled variant SCRPS, which is another strictly proper scoring rule besides Log score (ELPD) to use for model comparison. Requires 2 posterior predictive draws for each posterior draw, so blocked by https://github.com/arviz-devs/arviz/issues/2048. Coming to loo: https://github.com/stan-dev/loo/pull/203

Jun 23 '22 09:06 sethaxen

Generally agree. Not so sure about the logo one. It would basically be

groupby using xarray
call az.loo

I personally think it would be better to document this and have users do the groupby themselves instead of adding a logo function. If we add a function, I expect users to assume only cv schemes directly compatible with loo or logo are supported, but they can also do any combination themselves and I don't think we can support all possible cases, a draft example of already using logo with loo would be https://nbviewer.org/github/OriolAbril/Exploratory-Analysis-of-Bayesian-Models/blob/multi_obs_ic/content/Section_04/Multiple_likelihoods.ipynb#Predicting-team-level-performance.

Jun 27 '22 22:06 OriolAbril

Good point about logo. I agree, if we can document it, and the code only takes a few lines, then it doesn't make sense to add a logo implementation for this. Thanks for the example; should be straightforward to adapt.

Jul 20 '22 11:07 sethaxen

Hi @sethaxen , I was willing to implement the loo_expectation function in arviz.stats. From what I understood, we could implement it by calculating the weights using az.loo and then use the weights to calculate the weighted expectation, something like (values * weights).sum()/weights.sum(). I just wanted to know if I'm on the right track?

Dec 21 '23 14:12 aadya940

That would be one approach. loo both computes the log-weights using psislw and then estimates the ELPD. Technically this function only needs the log-weights from loo, so it would be more efficient to just call psislw as loo does.

something like (values * weights).sum()/weights.sum().

Yep, this is the right approach, with a few notes:

The log-weights returned by psislw are already normalized, so no need for weights.sum().
You probably want to avoid numerical issues that would happen when exp(log_weights[i]) will underflow to 0 but exp(log(abs(values[i])) + log_weights[i]) might actually be large. Something like (np.sign(values) * np.exp(np.log(np.abs(values)) + log_weights)).sum() might be the right approach.
Third, the PSIS paper in Section E describes an approach for diagnosing an expectation computed using importance weights, described below.

Namely, in addition to using psis(-log_likelihood) to get the LOO log-weights, you would also compute something like

summand = np.sign(values) * np.exp(np.log(np.abs(values)) - log_likelihood))
log_abs_summand = np.abs(values) - log_likelihood
max_log_abs_summand = log_abs_summand.max()
summand_scaled = np.sign(values) * np.exp(max_log_abs_summand - max_log_abs_summand)
tail_right = ...  # select right tail of summand_scaled using approach of https://github.com/arviz-devs/arviz/blob/6b1b2be60cca804d4b0ec43a3303820bfa8785c6/arviz/stats/stats.py#L974-L980
tail_left = ...  # do the same but for `-summand_scaled`
k_hat_right, _ = _gpdfit(tail_right)
k_hat_left, _ = _gpdfit(tail_left) 
k_hat = max(k_hat_right, k_hat_left)

It might be better to refactor _psislw slightly to reduce code reuse. k_hat should at least be used to warn if an expectation will be poorly estimated, but a user may optionally want it returned. Some reworking is probably needed if you want to compute expectations with respect to multiple functions simultaneously. (IIRC, this is effectively what loo_pit does and needs)

@avehtari does this look right?

FWIW, I would support adding loo_expectation in one PR and then adding the diagnostics in a later PR.

Dec 21 '23 18:12 sethaxen

Corresponding R functions are in https://github.com/stan-dev/loo/blob/master/R/E_loo.R, see specifically .wmean, .wvar, .wsd, .wquant. loo package is storing log weights, but in these computations the normalized weights are computed and used in .w* functions.

For the diagnostics, loo package has now also minimum sample size and sample size dependent khat-threshold. The new diagnostic summary messages are in a PR https://github.com/n-kall/posterior/blob/9a59fb57fbd7947a06eab908b3b2209fc60a2146/R/pareto_smooth.R#L611, and will be merged before the next release.

Dec 22 '23 08:12 avehtari

@sethaxen Thanks for the quick reply :+1: . I'll keep these points in mind and try to submit a draft PR implementing the loo_expectation function, you can review it whenever you have time.

Dec 22 '23 11:12 aadya940

arviz arviz copied to clipboard

Additional LOO functionality

arviz
arviz copied to clipboard