bayes-kit icon indicating copy to clipboard operation
bayes-kit copied to clipboard

API class design

Open bob-carpenter opened this issue 2 years ago • 1 comments

Design considerations

Goal: organize classes for inference and posterior analysis API

  • Continuous only, or also discrete parameters? (allow discrete data and generated quantities)

  • Two ways to operate (easily interconvertible)

    • batch: take a batch of draws of a given size
    • online: take a single draw
  • How to deal with transforms and generated quantities like in Stan? They generate new random variables as functiosn of others, which get their own expectations, control variates, etc.

  • Densities just functions over vectors rather than over arbitrary sequences of data types

  • Could introduce a higher-level notion of a distribution like in Boost, which can involve densities and RNGs bundled; could apply to prior specification or as a way to describe posterior class (sample plus density plus derivatives)

  • How to store all the meta-data like configuration, dates, etc.? Probably just a dictionary that is ideally usable as input.

Models

log prior plus log liklihood is equal to log density plus constant if all specified

  • float log_density(vector) vector grad_log_density(vector) and matrix hessian_log_density(vector)

  • float log_prior(vector) and vector grad_log_prior(vector) and matrix hessian_log_prior(vector)

  • float log_likelihood(vector) and vector grad_log_likelihood(vector) and hessian_log_likelihood(vector)

  • vector prior_sample()

Monte Carlo samplers

  • vector sample()

  • array(vector) ensemble_sample()

  • (vector, weight) importance_sample()

  • sampler sampling_importance_resampler(importance_sampler)

Variational approximation

approximate samplers add approximate density, effectively giving importance sampler

  • importance_sampler approximate_sampler()
  • dictionary variational_fit()

Laplace approximation

approximate samplers add approximate density, effectively giving importance sampler

  • importance_sampler laplace_fit()
  • sampler approximate_sampler()

Control variates

Tricky because they add an expectation-specific shadow value that is averaged along with draws to compute estimate of expectation.

  • vector control_variate(vector draws, array(vector) gradients)

Top-level controllers

Allow user to specify:

  • total time to spend on inference
  • target ESS size (e.g., implement by iterative deepening)
  • number of iterations

How to handle adaptation for adaptive samplers? They are just stateful in perhaps a discrete way like Stan Phase II warmup.

How to monitor convergence of adaptation to stop adaptation and start sampling?

Posterior analysis

all algorithms work with ragged data structures

  • R-hat (basic, split, rank, mini group nR-hat)
  • ESS (basic, multi-chain R-hat based, different estimators?)
    • head vs. tail
  • sample mean, standard-deviation
  • standard error (if standard-dev and ESS available)
  • quantiles (median, central interval, arbitrary)
  • log density output
  • other diagnostics (e.g., tree depth, divergence, etc.)

bob-carpenter avatar Dec 16 '22 20:12 bob-carpenter

Another issue I'd like to bring up for API design is how we run multiple samplers. For example, a sampler like random walk Metropolis generates a single Markov chain of draws (that is, an iterator of vectors). We need to be able to group a bunch of these chains together and run them. That converts a single chain interface into a multiple-chain interface like one of the ensemble methods. I'm thinking this is mainly going to be an issue when a user wants to gather up all the draws and (a) analyze posterior convergence/ESS, and (b) do inference. For (a), we need to keep the draws separated into chains, whereas for (b) we want to just throw them together into one big collection.

bob-carpenter avatar Jan 26 '23 20:01 bob-carpenter