bayes-kit
bayes-kit copied to clipboard
API class design
Design considerations
Goal: organize classes for inference and posterior analysis API
-
Continuous only, or also discrete parameters? (allow discrete data and generated quantities)
-
Two ways to operate (easily interconvertible)
- batch: take a batch of draws of a given size
- online: take a single draw
-
How to deal with transforms and generated quantities like in Stan? They generate new random variables as functiosn of others, which get their own expectations, control variates, etc.
-
Densities just functions over vectors rather than over arbitrary sequences of data types
-
Could introduce a higher-level notion of a distribution like in Boost, which can involve densities and RNGs bundled; could apply to prior specification or as a way to describe posterior class (sample plus density plus derivatives)
-
How to store all the meta-data like configuration, dates, etc.? Probably just a dictionary that is ideally usable as input.
Models
log prior plus log liklihood is equal to log density plus constant if all specified
-
float log_density(vector)
vector grad_log_density(vector)
andmatrix hessian_log_density(vector)
-
float log_prior(vector)
andvector grad_log_prior(vector)
andmatrix hessian_log_prior(vector)
-
float log_likelihood(vector)
andvector grad_log_likelihood(vector)
andhessian_log_likelihood(vector)
-
vector prior_sample()
Monte Carlo samplers
-
vector sample()
-
array(vector) ensemble_sample()
-
(vector, weight) importance_sample()
-
sampler sampling_importance_resampler(importance_sampler)
Variational approximation
approximate samplers add approximate density, effectively giving importance sampler
-
importance_sampler approximate_sampler()
-
dictionary variational_fit()
Laplace approximation
approximate samplers add approximate density, effectively giving importance sampler
-
importance_sampler laplace_fit()
-
sampler approximate_sampler()
Control variates
Tricky because they add an expectation-specific shadow value that is averaged along with draws to compute estimate of expectation.
-
vector control_variate(vector draws, array(vector) gradients)
Top-level controllers
Allow user to specify:
- total time to spend on inference
- target ESS size (e.g., implement by iterative deepening)
- number of iterations
How to handle adaptation for adaptive samplers? They are just stateful in perhaps a discrete way like Stan Phase II warmup.
How to monitor convergence of adaptation to stop adaptation and start sampling?
Posterior analysis
all algorithms work with ragged data structures
- R-hat (basic, split, rank, mini group nR-hat)
- ESS (basic, multi-chain R-hat based, different estimators?)
- head vs. tail
- sample mean, standard-deviation
- standard error (if standard-dev and ESS available)
- quantiles (median, central interval, arbitrary)
- log density output
- other diagnostics (e.g., tree depth, divergence, etc.)
Another issue I'd like to bring up for API design is how we run multiple samplers. For example, a sampler like random walk Metropolis generates a single Markov chain of draws (that is, an iterator of vectors). We need to be able to group a bunch of these chains together and run them. That converts a single chain interface into a multiple-chain interface like one of the ensemble methods. I'm thinking this is mainly going to be an issue when a user wants to gather up all the draws and (a) analyze posterior convergence/ESS, and (b) do inference. For (a), we need to keep the draws separated into chains, whereas for (b) we want to just throw them together into one big collection.