imp icon indicating copy to clipboard operation
imp copied to clipboard

Introduce families of distributions and a `DistributionRestraint`

Open sethaxen opened this issue 5 years ago • 0 comments

I've lost count of how many implementations we have of log-normal restraints. I propose a module with a series of classes that represent probability distributions. I started something like this a ways back by generalizing IMP::isd::FNormal and the like to an IMP::isd::Distributions base class, but we could make this more useful with the following features:

  • Computation of log density and CDF
  • The above with gradients
  • Functionality to fit distributions
  • Functionality to draw exact samples from distributions
  • Some check to ensure implied dependence assumptions are sensible (i.e. a parameter drawn from one distribution cannot also be drawn from another; rather, it can be drawn from their joint, which would be its own distribution. This distinction is important for PPCs; see below.)

A single DistributionRestraint would then wrap a distribution, along with some interface for mixing and matching FloatIndexes with constants. To restrain the output of some function with a DistributionRestraint would require the function adding the quantity to the Model attributes with a ScoreState upon model update and pulling back the adjoints (derivatives of scoring function wrt quantities) to the function inputs, which could be other model attributes.

This would prevent unnecessary code-reuse, which is nice, but it would also enable rapid iteration on the statistical model, including unlocking multi-level models. Once a user has a forward model with pullback implemented, they can test a variety of different probability distributions with no additional effort. Developer focus is then shifted away from generic code to the particulars for their data/representation.

Additionally, this is an essential first step toward prior- and posterior-predictive checks. It is known how to draw exact samples from most generic distributions. Such a DistributionRestraint could then be inverted, enabling us to draw model parameters and data from the distributions. This enables us to sanity check the implicit assumptions in our priors (prior-predictive) and to visualize the posterior in data-space (posterior-predictive).

sethaxen avatar Sep 07 '19 21:09 sethaxen