loo
loo copied to clipboard
Averaging predictive distributions
This is related to #82 but should be it’s own issue.
Currently we rely on users to use the model weights to create the appropriate mixture of predictive distributions. We should recommend (and demonstrate in a vignette with rstan and loo, and automate in rstanarm and brms) a method for doing this.
There are various options for doing it. Today @avehtari and I discussed this and we are leaning towards the approach of taking (approx) S*weight_k draws from each posterior predictive distribution, where there are K models and S is the desired sample size. This is better when weights are small than sampling from each posterior predictive distribution with probability weight_k, and easier to implement than a stratified version (maybe an option at some point?).
For resampling the reference is Kitagawa, G., Monte Carlo Filter and Smoother for Non-Gaussian Nonlinear State Space Models, Journal of Computational and Graphical Statistics, 5(1):1-25, 1996.
Simple random resampling samples indices randomly according to the weights weight_k.
In deterministic resampling indices are sampled using deterministic numbers u_j~(j-.a)/S, for fixed a (usually 0.5) in [0,1) and S is the number of draws. Compare this to simple random resampling where u_j~U[0,n]. Deterministic resampling has small bias and small variance.
In stratified resampling indices are sampled using random numbers u_j~U[(j-1)/S,j/S]. Stratified resampling is unbiased, almost as fast as deterministic resampling, and has only a slightly larger variance.
We are now thinking that the stratified version is best (and actually not so difficult to implement). For the stratified version we need a function like this to sample the model indices (or posterior draw indices of doing loo predictive checks):
# @param wts Vector of model weights.
# @param n_draws Number of draws to take from the mixture distribution.
# @return Integer vector of model indices.
sample_indices <- function(wts, n_draws) {
K <- length(wts)
w <- n_draws * wts # expected number of draws from each model
idx <- rep(NA, n_draws)
c <- 0
j <- 0
for (k in 1:K) {
c <- c + w[k]
if (c >= 1) {
a <- floor(c)
c <- c - a
idx[j + 1:a] <- k
j <- j + a
}
if (j < n_draws && c >= runif(1)) {
c <- c - 1
j <- j + 1
idx[j] <- k
}
}
return(idx)
}
Following up after today, For prediction averaging we then have something like
idx <- sample_indices(wts, n_draws = S) # sample_indices defined above
ypred <- list()
for (k in seq_along(wts)) {
ypred[[k]] <- posterior_predict(fits[[k]], draws = sum(idx==k), ...)
}
ypred <- do.call(rbind, ypred)
Separately, for an improvement over the LOO-PIT diagnostic we can use sample_indices()
to select which posterior draws to use and (how many times each) and then we use these resampled posterior draws to draw yrep from the predictive distribution and make the SBC-style histogram. In this case the wts are the weights from PSIS, not the model weights. The number of draws to use is a somewhat open question.
Is there much progress with this? I talked with @jgabry about maybe implementing if no one else is working on it?
I don't think anyone else has done anything for this.
Cool! Was a decision made about the number of draws?
I think we can go with default 4000 for combining predictions and for LOO-PIT if it's ranks then 1023 as in SBC paper
I just got an email asking about how to do this and it reminded me of this open issue. Until we add a function to automate this, should we just add an example of doing it in the doc? Or @avehtari is there an example somewhere in one of your model comparison case studies / notebooks of doing this that we can point to?
posterior
has now support for weighted draws and importance resampling, so it would be good to use that implementation. Then what need to be added would be a function setting the weights for the predictive draws based on model weights, and then call posterior
to do the rest.