SBC
SBC copied to clipboard
Error in `dplyr::select()`
I'm not sure how to proceed after getting this error
Simulation 1 resulted in error when post-processing the fit.
Calling `recompute_SBC_statistics` after you've found and fixed the problem could let you move further without refitting
Error in `dplyr::select()`:
! Can't subset columns that don't exist.
✖ Column `variable` doesn't exist.
With this code
library(SBC) # remotes::install_github("hyunjimoon/SBC")
library(cmdstanr)
library(MCMCpack)
library(ggplot2)
library(future)
plan(multisession)
options(SBC.min_chunk_size = 5)
m <- cmdstan_model("softmax.stan")
backend <- SBC_backend_cmdstan_sample(m, chains = 2)
generate_one_dataset <- function(N, K, prior_alpha = 1) {
x <- rdirichlet(1, alpha = rep(prior_alpha, K))
observed <- as.integer(rmultinom(1, size = N, prob = x))
list(
variables = list(x = x),
generated = list(K = K, observed = observed, prior_alpha = prior_alpha)
)
}
y <- generate_one_dataset(N = 100, K = 20, prior_alpha = 4)
y_out <- m$sample(
data = list(
K = y$generated$K,
observed = y$generated$observed,
prior_alpha = y$generated$prior_alpha
),
parallel_chains = 4
)
datasets_flat <- generate_datasets(
SBC_generator_function(generate_one_dataset, N = 100, K = 20),
n_sims = 10)
res_flat <- compute_SBC(datasets_flat, backend)
and Stan model
data {
int K;
array[K] int<lower=0> observed;
real<lower=0> prior_alpha;
}
parameters {
vector[K - 1] y;
}
transformed parameters {
simplex[K] x = softmax(append_row(y, 0.));
}
model {
x ~ dirichlet(rep_vector(prior_alpha, K));
observed ~ multinomial(x);
}
Thanks for reporting!
The underlying problem is that if you look at datasets_flat$variables
, you'll notice that it treats x
as 2D array, so the variables are called x[1,1], x[1,2]
... , while the variables in Stan fit are called x[1], x[2]
. SBC then finds no shared variables and this leads to problems. This deserves a better error message.
However, you can get the code to work by changing to:
generate_one_dataset <- function(N, K, prior_alpha = 1) {
x <- as.numeric(rdirichlet(1, alpha = rep(prior_alpha, K)))
observed <- as.integer(rmultinom(1, size = N, prob = x))
list(
variables = list(x = x),
generated = list(K = K, observed = observed, prior_alpha = prior_alpha)
)
}
Also note that the $generated
element of datasets is already in format to be pushed to Stan, so you can simplify your test with:
y_out <- m$sample(
data = y$generated,
parallel_chains = 4
)
I'll try to inform messages around this issue...
Thanks for the quick response @martinmodrak!