posterior
posterior copied to clipboard
Don't drop weights when subsetting variables?
I noticed that subsetting the variables of a draws object will drop weights:
# Based on the "Examples" section in `?posterior::weight_draws`:
suppressPackageStartupMessages(library(posterior))
x <- as_draws_matrix(example_draws())
wts <- rexp(ndraws(x))
( x <- weight_draws(x, weights = wts) )
#> Loading required namespace: testthat
#> # A draws_matrix: 100 iterations, 4 chains, and 10 variables
#> variable
#> draw mu tau theta[1] theta[2] theta[3] theta[4] theta[5] theta[6]
#> 1 2.01 2.8 3.96 0.271 -0.74 2.1 0.923 1.7
#> 2 1.46 7.0 0.12 -0.069 0.95 7.3 -0.062 11.3
#> 3 5.81 9.7 21.25 14.931 1.83 1.4 0.531 7.2
#> 4 6.85 4.8 14.70 8.586 2.67 4.4 4.758 8.1
#> 5 1.81 2.8 5.96 1.156 3.11 2.0 0.769 4.7
#> 6 3.84 4.1 5.76 9.909 -1.00 5.3 5.889 -1.7
#> 7 5.47 4.0 4.03 4.151 10.15 6.6 3.741 -2.2
#> 8 1.20 1.5 -0.28 1.846 0.47 4.3 1.467 3.3
#> 9 0.15 3.9 1.81 0.661 0.86 4.5 -1.025 1.1
#> 10 7.17 1.8 6.08 8.102 7.68 5.6 7.106 8.5
#> # ... with 390 more draws, and 2 more variables
#> # ... hidden reserved variables {'.log_weight'}
( x_sub <- x[, 1:2] )
#> # A draws_matrix: 100 iterations, 4 chains, and 2 variables
#> variable
#> draw mu tau
#> 1 2.01 2.8
#> 2 1.46 7.0
#> 3 5.81 9.7
#> 4 6.85 4.8
#> 5 1.81 2.8
#> 6 3.84 4.1
#> 7 5.47 4.0
#> 8 1.20 1.5
#> 9 0.15 3.9
#> 10 7.17 1.8
#> # ... with 390 more draws
Created on 2023-08-15 with reprex v2.0.2
As you can see, x_sub lacks the hidden reserved variable .log_weight.
I'm not sure if this is intended behavior or not, but I think subsetting variables (variables only, not the draws or chains) should keep weights.
I forgot to mention that other hidden reserved variables (other than .log_weight) might be affected as well, so perhaps this needs to be resolved by a more general approach which ensures that subsetting columns keeps any hidden reserved variables.
I think, we need to make an explicit design decision on whether we want to retain hidden variables or not when subsetting via []. I can imagine it can become very annoying for users if they try to get rid of all variables except for a view via [] but the hidden variables refuse to leave. @mjskay and @jgabry what do you think?
That said, you can always use subset_draws(., variable = ...) to subset while retaining hidden variables.
Hmmm, that's a tough one. It does seem like the default behavior of [ should be to keep reserved variables (or at least .log_weight --- which is currently equivalent since it's the only reserved variable for that format?) to avoid silent errors caused by unexpected dropping of weights. Maybe [ should gain an argument indicating whether reserved variables are subject to the subset (defaulting to not dropping them)? (Perhaps subset() should too?)
I agree. So both [] and subset(_draws) should keep reserved variables by default but both gain an argument to turn this off? Sounds like a good approach to me.
I agree. So both [] and subset(_draws) should keep reserved variables by default but both gain an argument to turn this off? Sounds like a good approach to me.
Sounds good to me too.
One challenge towards this will be that we cannot just use NextMethod for [] in some cases. For example, in the corresponding .draws_matrix method I see:
`[.draws_matrix` <- function(x, i, j, ..., drop = FALSE) {
# TODO: allow for argument 'reserved' as in '[.draws_df'
# right now this fails because NextMethod() cannot ignore arguments
out <- NextMethod("[", drop = drop)
if (length(dim(out)) == length(dim(x))) {
class(out) <- class(x)
.nchains <- nchains(x)
if (missing(i)) {
attr(out, "nchains") <- .nchains
} else if (.nchains > 1L) {
warn_merge_chains("index")
}
}
out
}
Just leaving this here for furture reference because I will surely forget otherwise.