imp
imp copied to clipboard
Introduce families of distributions and a `DistributionRestraint`
I've lost count of how many implementations we have of log-normal restraints. I propose a module with a series of classes that represent probability distributions. I started something like this a ways back by generalizing IMP::isd::FNormal
and the like to an IMP::isd::Distributions
base class, but we could make this more useful with the following features:
- Computation of log density and CDF
- The above with gradients
- Functionality to fit distributions
- Functionality to draw exact samples from distributions
- Some check to ensure implied dependence assumptions are sensible (i.e. a parameter drawn from one distribution cannot also be drawn from another; rather, it can be drawn from their joint, which would be its own distribution. This distinction is important for PPCs; see below.)
A single DistributionRestraint
would then wrap a distribution, along with some interface for mixing and matching FloatIndex
es with constants. To restrain the output of some function with a DistributionRestraint
would require the function adding the quantity to the Model
attributes with a ScoreState
upon model update and pulling back the adjoints (derivatives of scoring function wrt quantities) to the function inputs, which could be other model attributes.
This would prevent unnecessary code-reuse, which is nice, but it would also enable rapid iteration on the statistical model, including unlocking multi-level models. Once a user has a forward model with pullback implemented, they can test a variety of different probability distributions with no additional effort. Developer focus is then shifted away from generic code to the particulars for their data/representation.
Additionally, this is an essential first step toward prior- and posterior-predictive checks. It is known how to draw exact samples from most generic distributions. Such a DistributionRestraint
could then be inverted, enabling us to draw model parameters and data from the distributions. This enables us to sanity check the implicit assumptions in our priors (prior-predictive) and to visualize the posterior in data-space (posterior-predictive).