imp Reimplement Nuisance to sample in an unconstrained space

Drawing from some of the tricks used in probabilistic programming packages, I'm proposing a reimplementation to IMP.isd.Nuisance in IMP that should enable better sampling. The beginnings of an implementation can be found here. How this changes things is detailed in the bullet points in the below exchanges with @benmwebb:

@sdaxen:

I've been reading about modern practices with Hamiltonian Monte Carlo (HMC), and one thing I've learned is that HMC typically has a hard time exploring the parameters of constrained variables near the bound of the constraint (Gibbs is even worse at this), which can result in oscillations between two highly biased regimes in the resulting samples, preventing an accurate sample from being drawn. The trick that's usually employed here is to transform the variables so that their support is the real line. The usual transformations are log for a variable with only an upper or lower bound and logit for a variable with both (with necessary shifting factors, etc). These transformations usually make exploration of the space much more efficient. Modifying all restraints to operate on log-scale parameters, etc is really cumbersome.

My idea is to modify the implementation of Nuisance so that instead of storing the nuisance value and enforcing a constraint, it stores only the transformed nuisance (log, logit, etc) and computes the untransformed nuisance upon request. Adding to the nuisance derivative would simply add to the derivative of the transformed nuisance multiplied by the Jacobian. Somehow the necessary Jacobian correction to the posterior will always be added to the score and derivative (once per transformed nuisance per evaluation). The bounds would be used to determine which transformation should be applied and to warn if the user manually sets the nuisance to a value outside of the constraints. An additional benefit is that all nuisances regardless of scale/bounds are on the same rough scale when transformed, and a nuisance mover can operate with some default parameters on the transformed nuisance value. The only change to how Nuisances behave is that the bounds will be exclusive instead of inclusive, but for parameters like Scale, we generally don't want them to ever be 0 anyways, and the fact they can be 0 often requires us to catch a resulting inf/nan in our restraints. What are your thoughts on this?

@benmwebb:

Seems reasonable, assuming it doesn't break existing code. Your suggestion to make a branch to play with this sounds ideal.

@sdaxen:

Cool. I've implemented most of it, and I suspect it will break some tests that check result of scoring function evaluation or derivative of nuisance but in a well-defined way that we can check. Any sampled drawn from the posterior should be improved. There are a few other behaviors it changes (question below):

Bounds are now exclusive. If setting the nuisance outside of a bound, it sets the transformed nuisance such that the nuisance is as close to the bound as possible without being equal. This means some code that set nuisances to have tighter bounds than intended to avoid overflow is probably unnecessary. Related: Hard bounds should only be used if there's a physical/mathematical constraint there (e.g. Scale, Switching). To avoid unrealistically high values, a properly tuned prior should be used. Setting hard bounds that are reachable when sampling by the scoring function can really break inferences, and if they're unreachable, there's no need for a hard constraint (This is now documented). The crosslinking restraints in PMI are an example of where this is done and really should be updated accordingly.

Upper and lower bounds now can't be equal. This should go without saying, as the proper way to constrain a nuisance to a singular value is just to not optimize it. There are a couple tests where upper and lower bounds are carelessly set to be equal but the particle is set to be optimized.

NormalMover moves the transformed nuisance, not the nuisance. This is necessary to solve the biased sampling issue that's the whole point. Might be necessary to tweak a couple of the hardcoded nuisance moves in PMI so they behave similarly to a mover on the Nuisance.

Moving a Nuisance that is used for an upper or lower bound changes the value of the bounded nuisance, though its transformation is unchanged. I can't see any way to not do this without adding a decorator for NuisanceBound. I've not found anywhere in the IMP code where we bound a Nuisance with a Nuisance, so not certain how big of a deal this is.

The NuisanceScoreState is now used to update the derivative of the transformed nuisance with the derivative of the Jacobian-based modification of the prior due to change of variables. This happens after evaluation. I chose to do it here instead of adding it in the prior restraint because we want to maintain the default uniform prior on the untransformed nuisance if the user doesn't specify a prior.

Upon evaluation, the score should have the Jacobian-based modifications of all Nuisances added to it. The last point is the one thing that's missing (besides a couple Jacobian tests) before I go on hunts for failed tests. My idea is that it should be handled by the ScoringFunction base class upon evaluation. So an arbitrary ScoringFunction should have access to a list of all particles that have been decorated as nuisances in the model. Is that information accessible from the attribute table or would we need a new entry? Or is there a better way to do this? I'm trying to make it user-proof so that the user never needs to worry about setting up the transformation and its corrections properly can can just trust that the samples drawn using the scoring function are equivalent to draws from the untransformed scoring function that they defined.

@benmwebb:

I don't think it's a good idea to have the scoring function examine all particles in the model looking for Nuisances (if I understand you correctly). If nothing else, this will end up considering Nuisances that aren't even used in the scoring function. The simplest way would be to add a suitable restraint to the scoring function that gets given a list of all the Nuisances and then calculates the necessary correction. A more involved but "automatic" way might be to write a custom ScoringFunction subclass that goes through all of its restraints, calls get_particles() on each one, tries to cast to Nuisance and then calculates the correction.

Aug 09 '18 01:08 sethaxen

@benmwebb:

I don't think it's a good idea to have the scoring function examine all particles in the model looking for Nuisances (if I understand you correctly). ... The simplest way would be to add a suitable restraint to the scoring function that gets given a list of all the Nuisances and then calculates the necessary correction. A more involved but "automatic" way might be to write a custom ScoringFunction subclass that goes through all of its restraints, calls get_particles() on each one, tries to cast to Nuisance and then calculates the correction.

I was more thinking that the attribute table would contain a special list nuisances_ with ParticleIndexes that have been decorated with Nuisance. It could be populated by IMP.isd.Nuisance.setup_particle. Then ScoringFunction would just iterate through this list, and for each Nuisance that is optimized, add the necessary correction to the score.

I don't think it's sufficient to have a subclass of ScoringFunction do this, because then it's not user-proof. If the user didn't use the subclass, then their samples from the posterior are guaranteed to be wrong because the posterior is using the wrong measure. Likewise, if the user has to add a special restraint to ScoringFunction to get this behavior, there will likely be cases where that doesn't happen. Or perhaps I'm misunderstanding how ScoringFunction works; my understanding is that it produces the score -log posterior by computing a weighted sum of the scores of the individual restraints. I suppose a compromise here is for a special user-hidden Restraint that ScoringFunction automatically adds to the list of provided restraints. The caveat is that (I think) this should be unaffected by temperature in MC/MD. Infinite temperature reduces all priors in the untransformed space to uniform, which means that the scoring function would just consist of the -log Jacobian term, unmodified by the temperature. So maybe that means this should be something entirely separate from the scoring function that these optimizers make special use of.

@benmwebb:

If nothing else, this will end up considering Nuisances that aren't even used in the scoring function.

That's actually the necessary behavior. If a Nuisance is being optimized using a scoring function in which it never appears, this is equivalent to including a uniform restraint on the untransformed parameter. But if the transformed parameter is being optimized, the corresponding restraint would not be uniform due to the Jacobian correction. i.e. to maintain the default behavior that an unrestrained untransformed parameter is uniform, the Jacobian correction to the score is needed even if the parameter is not restrained with a prior.

Aug 09 '18 02:08 sethaxen

Here's a better alternative: Implement a child of Nuisance called TransformedNuisance. It should be usable everywhere where Nuisance is used, but the value of the nuisance is constrained to be the inverse transform of TransformedNuisance and is therefore only guaranteed to be correct after Model.evaluate(). A single score state will handle this and the Jacobian adjustment of derivatives for all TransformedNuisances. Switching to the transformed nuisance should be tear-free.

The only complicated thing here is what I mentioned in the last post. The log Jacobian score adjustment should be invariant to re-weighting. Right now optimizers like IMP.core.MonteCarlo don't allow for any changes to the score after scoring function evaluation but before acceptance/rejection. A new class IMP.core.JacobianAdjuster will be responsible for returning the adjustment to the score needed for transformed variables (nuisances and any others in the future), and optimizers will need to explicitly handle this.

Aug 12 '18 02:08 sethaxen

All the pieces are there for uni- and multi-variate adjustment of score and gradients. Without modifying IMP.ScoringFunction, the last piece is for IMP.core.JacobianAdjuster.get_score_adjustment() to be called everywhere in an IMP.Optimizer where an IMP.ScoringFunction is evaluated.

Aug 23 '18 08:08 sethaxen

imp imp copied to clipboard

Reimplement Nuisance to sample in an unconstrained space

imp
imp copied to clipboard