SBC
SBC copied to clipboard
Vignette for large/complex model workflow
Main ideas:
- Can start with ADVI/optimization
- Can (sometimes) use reduced data/model (but then reduced validity, still better than nothing)
- I.e. just one random effect, just three states in a HMM, ...
- How to learn something even from a single fit? (will need some additional code/support)
- Merge all the ranks together
- z-scores (or equivalent)
I would be excited to push this forward if you think one of the following (preferably their intersection) can be considered as complex model from your point of view. Links are related issues for each module.
@Dashadower will first attack hierarchical and @hyunjimoon will attack high dimension.
Building on @martinmodrak's small workflow, @tomfid and I will make a document for Bayesian workflow on ODE model for urban dynamics (mdl file).
This satisfies at least two categories above (ODE, high dimension). Moreover, Vicky's research on urban scaling, regressing the log of city index such as congestion or income inequality with log of city population opens up the path toward hierarchical Bayesian modeling. In short, hierarchical Bayesian start from viewing $\beta$ as random variable (hence the name Bayesian regression or Random effect model) and giving a prior distribution. Then, we replace the scalar value of prior parameter to another random variable. This allows multi-level effect of learning from data, placing itself as partial pooling between no-pooling and complete pooling model. This document is a good introduction as it has both Stan code and similar model structure (log-log).
Considering $\beta$ is (averaged) elasticity at fixed time from eq.2.2. in the first paper above, Tom's writing on challenges for informative parameter setting in dynamic models is relevant. Following the excerpt from Bayesian workflow ("a pragmatic idea is to keep the priors and compute reasonable parameter values using the real data. This can be done either through rough estimates or by computing the actual posterior. We then suggest widening out the estimates slightly and using these as a prior for the SBC.") I recommend starting from a tight prior. Absence of prior knowledge or data is an elephant in the room which I aim to address with priordb project where realistic values of parameter that could be assumed are collectively learned.
We will include:
Must
- Classifying parameters into three: assumed parameter, assumed parameter time-series, estimated parameter.
- Justification of prior specification and its automation
- Three checks: prior predictive, posterior predictive, simulation-based calibration
For 2, sections "Multiplicative error and the lognormal distribution, Weakly informative priors, Priors for system parameters and noise scale" from this case study on population dynamics is a good place to start for setting distribution and parameter for prior. This corresponds to "Specify_implicit" (H5.abc) from this Human-Machine collaboration table (HMC table).
We are consider including:
Option
- Demand prior elicitation to policy function and its optimization for policy prescription
- Comparing different
posterior approximatormodules (MCMC, variational inference, optimization)
For 4, translating Vensim's .vpd to Stan model block is the key as then we can use its optimization engine like this restaurant revenue optimization example.
For 5, the aim is to find the cheaper(-est) computation that reaches conditioned precision (step 9 from HMC table).
Tool
stan_builderI am developing with @Dashadower on @JamesPHoughton's Pysd (currently on stan-backend branch pull-requested here)- One example on prey-predator here which includes 1,2,3 for above.
- More complicated example is on stock management here with at least ten assumed parameters (will be updated by 9/5)
Ref
- Workflow sequence conditional on data and model by Mike Betancourt here
@jandraor and I am trying this with three example models in https://github.com/Data4DM/BayesSD/discussions/76. @tomfid's help, especially regarding inferencedata is helpful as vensim supporting this format would be crucial in connecting Vensim subscript with hierarchical Bayesian.
Also, @OriolAbril and @ahartikainen are helping connecting this to arviz. Thanks!
@martinmodrak @Dashadower, could current SBC R library's output be easily transformed to inferencedata by any chance? Or would there be any reference codes we can refer to e.g. previous attempts of our community to connect posterior and arviz? @Jandraor and I are using different language (R, Python) and wondered whether we can pool our efforts in plots by having a modularized data structure.
cc @mike-lawrence
this issue https://github.com/stan-dev/posterior/issues/85 sounds relevant to interoperability
Below is rough plan which I felt needed for large model workflow. Enjoyable milestone is Bayesian workflow dynamic model casestudy on prey-predator, SEIR, inventory management by around March, 2023. Thank you very much, all!
Goal: bridging Vensim ecosystem with Stan ecosystem

- provide efficient (gradient-based) and effective (diagnostics) HMC-estimator to dynamic model (generator)
- consistency among data(
.nc), model (.stan), plots (.png) - template for simulation-based calibration checks e.g. this python file
For this, I am trying to
-
connect stanify with Dynamic simulation scenario 1,2 (with @tomfid, @enekomartinmartinez, @tseyanglim, @JamesPHoughton's support)
-
1's result by putting many
.ncfiles into onesbc.nc(with @Dashadower, @OriolAbril, @ahartikainen's support) -
connect
.ncoutput with SBC package viarvarconcept (with @paul-buerkner, @martinmodrak, @jandraor's support)
Dynamic simulation scenario (Vensim)
outputs netcdf format (.nc). Scenarios to reach .nc.
scenario 1) Vensim/Stella user on Python
- pysd translates
.mdl,.xmlto python objects (support Stella, which stanify lacks now) - pysd can output synthetic data in
generator.nc - @enekomartinmartinez, @JamesPHoughton maintains pysd
scenario 2) Pure Vensim user
- Vensim
.mdlis considering supporting.ncformat, if this happens, it can outputgenerator.ncandestimator.nc - Vensim has internal MCMC, but wish to connect to HMC
- @tomfid is Vensim CTO and @tseyanglim has expertise in connecting Vensim hierarchical model with python
stanify(scenario 1 or 2)
- stanify translates
.mdlto.stanand outputs onegenerator.ncandestimator.ncfor baseline case (no hierarchy, no prior_draw's') - stanify outputs one
generator.ncand three (n_prior_draws) number ofestimator.ncfor SBC - stanify outputs one
generator.ncand two (n_subgroups) number ofestimator.ncfor hierarchical model
- @Dashadower is SBC developer, @OriolAbril, @ahartikainen are arviz developer
- arviz's wrapper or multi-index for posterior group can streamline 2,3 which can be added on @OriolAbril's sbc-cmdstanpy
Computational statistician (Stan)
transform .nc to rvars which SBC package supports. Three verifications needed:
- aria code translates
.nctorvarshere (@mike-lawrence, could you confirm this?) - SBC supports
rvarshere (@martinmodrak, could you confirm this?) rvar-based Bayes visualization + empirical coverage plot is explainable enough for policy specification on dynamic model (@jandraor, could you confirm this?). Matthew's rvars explains howrvarcan make visualization easy by grouping random variable which may be relevant to inferencedata's data variable concept.