posteriordb icon indicating copy to clipboard operation
posteriordb copied to clipboard

Store the actual posterior dimension

Open avehtari opened this issue 5 months ago • 3 comments

One measure of how difficult a posterior is, is the number of dimensions, For example posterior_database/posteriors/GLM_Poisson_Data-GLM_Poisson_model.json reports "dimensions",

{
  "keywords": ["bpa book", "Poisson model"],
  "urls": "https://github.com/stan-dev/example-models/tree/master/BPA/Ch.03",
  "references": "kery2011population",
  "dimensions": {
    "alpha": 1,
    "beta1": 1,
    "beta2": 1,
    "beta3": 1,
    "log_lambda": 40,
    "lambda": 40
  },
  "reference_posterior_name": null,
  "added_date": "2021-07-01",
  "added_by": "Kane Lindsay",
  "name": "GLM_Poisson_Data-GLM_Poisson_model",
  "model_name": "GLM_Poisson_model",
  "data_name": "GLM_Poisson_Data"
}

but looking at the code, these "dimensions" include transformed parameters and generated quantities which have high dimensions, but not influence how difficult the posterior is

parameters {
  real<lower=-20, upper=20> alpha;
  real<lower=-10, upper=10> beta1;
  real<lower=-10, upper=10> beta2;
  real<lower=-10, upper=10> beta3;
}
transformed parameters {
  vector[n] log_lambda;
  
  log_lambda = alpha + beta1 * year + beta2 * year_squared
               + beta3 * year_cubed;
}
generated quantities {
  vector[n] lambda;
  
  lambda = exp(log_lambda);
}

It would be good to report the actual posterior dimensionality.

avehtari avatar Jun 03 '25 19:06 avehtari

I agree. How should we do this in the best way?

Maybe add ”posterior dimension” as a slot and only add those parameters thats in the parameter block. We would also need to handle the parameter types (like simplex, covariance matrix etc). However this would probably not be that difficult.

Something like:

  "posterior_dimensions": {
    "alpha": {”real”:1},
    "beta1": {”real”:1},
    "beta2": {”real”:1},
    "beta3": {”real”:1}
  }

MansMeg avatar Jun 04 '25 05:06 MansMeg

It could also be just one number matching the dimensionality of unconstrained space

avehtari avatar Jun 04 '25 06:06 avehtari

In R we can first create a fit object and get some valid values for the parameters

po <- posterior("GLM_Poisson_Data-GLM_Poisson_model", pdb)
mod <- cmdstan_model(stan_file = stan_code_file_path(po))
fit <- mod$sample(data=pdb_data(po), init=0.01, iter_warmup=1, iter_sampling=1, chains=1, refresh=0, show_messages=FALSE, show_exceptions=FALSE, diagnostics=NULL, sig_figs = 12)
pars <- names(fit$variable_skeleton(transformed_parameters = FALSE, generated_quantities = FALSE))
drs <- fit$draws()
vars <- sapply(pars, \(par) as.numeric(subset_draws(drs, variable=par)), simplify=FALSE, USE.NAMES=TRUE)

and then get the dimensions of constrained space

length(unlist(vars))

and the dimensions of unconstrained space

length(fit$unconstrain_variables(vars))

avehtari avatar Jun 04 '25 07:06 avehtari