posteriordb
posteriordb copied to clipboard
Store the actual posterior dimension
One measure of how difficult a posterior is, is the number of dimensions,
For example posterior_database/posteriors/GLM_Poisson_Data-GLM_Poisson_model.json reports "dimensions",
{
"keywords": ["bpa book", "Poisson model"],
"urls": "https://github.com/stan-dev/example-models/tree/master/BPA/Ch.03",
"references": "kery2011population",
"dimensions": {
"alpha": 1,
"beta1": 1,
"beta2": 1,
"beta3": 1,
"log_lambda": 40,
"lambda": 40
},
"reference_posterior_name": null,
"added_date": "2021-07-01",
"added_by": "Kane Lindsay",
"name": "GLM_Poisson_Data-GLM_Poisson_model",
"model_name": "GLM_Poisson_model",
"data_name": "GLM_Poisson_Data"
}
but looking at the code, these "dimensions" include transformed parameters and generated quantities which have high dimensions, but not influence how difficult the posterior is
parameters {
real<lower=-20, upper=20> alpha;
real<lower=-10, upper=10> beta1;
real<lower=-10, upper=10> beta2;
real<lower=-10, upper=10> beta3;
}
transformed parameters {
vector[n] log_lambda;
log_lambda = alpha + beta1 * year + beta2 * year_squared
+ beta3 * year_cubed;
}
generated quantities {
vector[n] lambda;
lambda = exp(log_lambda);
}
It would be good to report the actual posterior dimensionality.
I agree. How should we do this in the best way?
Maybe add ”posterior dimension” as a slot and only add those parameters thats in the parameter block. We would also need to handle the parameter types (like simplex, covariance matrix etc). However this would probably not be that difficult.
Something like:
"posterior_dimensions": {
"alpha": {”real”:1},
"beta1": {”real”:1},
"beta2": {”real”:1},
"beta3": {”real”:1}
}
It could also be just one number matching the dimensionality of unconstrained space
In R we can first create a fit object and get some valid values for the parameters
po <- posterior("GLM_Poisson_Data-GLM_Poisson_model", pdb)
mod <- cmdstan_model(stan_file = stan_code_file_path(po))
fit <- mod$sample(data=pdb_data(po), init=0.01, iter_warmup=1, iter_sampling=1, chains=1, refresh=0, show_messages=FALSE, show_exceptions=FALSE, diagnostics=NULL, sig_figs = 12)
pars <- names(fit$variable_skeleton(transformed_parameters = FALSE, generated_quantities = FALSE))
drs <- fit$draws()
vars <- sapply(pars, \(par) as.numeric(subset_draws(drs, variable=par)), simplify=FALSE, USE.NAMES=TRUE)
and then get the dimensions of constrained space
length(unlist(vars))
and the dimensions of unconstrained space
length(fit$unconstrain_variables(vars))