imp Introduce families of distributions and a `DistributionRestraint`

Introduce families of distributions and a `DistributionRestraint`

Open sethaxen opened this issue 5 years ago • 0 comments

I've lost count of how many implementations we have of log-normal restraints. I propose a module with a series of classes that represent probability distributions. I started something like this a ways back by generalizing IMP::isd::FNormal and the like to an IMP::isd::Distributions base class, but we could make this more useful with the following features:

Computation of log density and CDF
The above with gradients
Functionality to fit distributions
Functionality to draw exact samples from distributions
Some check to ensure implied dependence assumptions are sensible (i.e. a parameter drawn from one distribution cannot also be drawn from another; rather, it can be drawn from their joint, which would be its own distribution. This distinction is important for PPCs; see below.)

A single DistributionRestraint would then wrap a distribution, along with some interface for mixing and matching FloatIndexes with constants. To restrain the output of some function with a DistributionRestraint would require the function adding the quantity to the Model attributes with a ScoreState upon model update and pulling back the adjoints (derivatives of scoring function wrt quantities) to the function inputs, which could be other model attributes.

This would prevent unnecessary code-reuse, which is nice, but it would also enable rapid iteration on the statistical model, including unlocking multi-level models. Once a user has a forward model with pullback implemented, they can test a variety of different probability distributions with no additional effort. Developer focus is then shifted away from generic code to the particulars for their data/representation.

Additionally, this is an essential first step toward prior- and posterior-predictive checks. It is known how to draw exact samples from most generic distributions. Such a DistributionRestraint could then be inverted, enabling us to draw model parameters and data from the distributions. This enables us to sanity check the implicit assumptions in our priors (prior-predictive) and to visualize the posterior in data-space (posterior-predictive).

Sep 07 '19 21:09 sethaxen

imp imp copied to clipboard

Introduce families of distributions and a `DistributionRestraint`

imp
imp copied to clipboard