Venturecxx icon indicating copy to clipboard operation
Venturecxx copied to clipboard

The boundary condition problem for gradients

Open axch opened this issue 9 years ago • 0 comments

Consider doing Hamiltonian Monte Carlo on a variable whose prior follows the gamma distribution. Gamma's output is always positive, but (for some parameter values) its density grows as the value approaches zero from above. So, both gradient ascent and HMC are happy to suggest a negative value for the gamma output. Currently, this causes Venture to crash with "Number is not positive".

The actual crash is due to the gamma SP asserting positivity of its output. Maybe it just shouldn't do that, and report a -inf log density for negative numbers instead, but that wouldn't be ideal either: gradient ascent would get stuck (because the gradient at any negative point is mathematically zero, though we could try to kludge it) and HMC would always reject any trajectory that went there.

What should happen in these circumstances?

  1. We could try some kludge: getting gradient ascent to walk back somehow; rejecting HMC transitions that walk off into impossible places.
  2. We could implement boundary-sensitive methods: gradient ascent could just stop at the exact boundary value, and HMC trajectories could bounce off (Radford Neal actually describes a boundary constrained HMC variant, with proof of soundness).
  3. We could (in principle) do what Stan does: automatically do a change of variables to put the boundary infinitely far away. I don't know whether doing that is sound for gradient ascent (though Stan's "optimize" function does, as far as I know, do it).
  4. Both of the latter two require exposing the locations of the boundaries to the inference method. That is, gradient of regen would have to have an interface like "you are here; the gradient points this way; and here are the boundaries of the region you are allowed to walk around in." (Note: HMC can walk off a boundary even when the gradient points away from it, because it uses the gradient as an acceleration rather than a velocity.)

Related but less problematic: both HMC and gradient ascent are currently perfectly happy to follow a gradient over a benign discontinuity in the local posterior, such as one introduced by if expressions, regardless of how big a jump in the density that causes and in what direction.

  • From a correctness lens, this is actually fine for HMC, and at least excusable for gradient ascent.
  • From a performance lens, we might want to treat such jumps as boundaries, to prevent these methods from falling off of good plateaus (but, by so doing, we would also be preventing them from jumping off of bad plateaus).
  • This is trickier to implement, because locating the exact place where the condition of an if changes value requires solving equations; and the locus of such points may be nonlinear.

David Wadden has been bitten by this problem.

[edited by @riastradh-probcomp to number the options]

axch avatar Feb 02 '16 07:02 axch