RFC: how to handle implicit quantities associated with coordinates
Motivation
Suppose that for a set of parameters $x$, the equation $F(x, y) = 0$ defines $y(x)$ implicitly. Eg $x$ could be parameters to a problem that we approximate numerically, and $y$ the parameters of an approximation we obtain numerically (rootfinding etc). Given data $d$, the likelihood is defined as $\ell(d \mid x, y)$.
Theoretically, one could of course solve for the $y$ that belongs to each $x$. But this may be expensive and brittle, and if
$$ x_2 = x_1 + \Delta $$
then
$$ \hat{y}_2 = y_2 + \frac{\partial y}{\partial x} \Delta $$
would be a good initial guess for $y_2 = y(x_2)$.
Ideally, "users" like Turing.jl and DynamicHMC.jl should be able to ignore the details of these things and just carry on doing HMC/NUTS/etc with minimal changes.
Proposal: allow coordinates to be opaque
I propose an addition to the API composed of 3 functions, with the fallbacks
lift(ℓ, x::AbstractVector) = x
unlift(ℓ, x::AbstractVector) = x
translate(ℓ, x::AbstractVector, Δ::AbstractVector) = x .+ Δ
Specifically,
- "users" would call
liftwhen generating random points for starting MCs, and in similar situations. Otherwise they would usetranslate, - similarly,
unliftwould be called when coordinates are needed (eg turn statistics), - leapfrog and RWMH steps would use
translate. - otherwise the result of
liftand thexarguments oflogdensity,logdensity_and_gradient,translate,unliftare allowed to be opaque objects, not an::AbstractVectorof real numbers. Nevertheless,logdensity_and_gradientshould provide a valid gradient ofx -> logdensity(ℓ, lift(ℓ, x)), but how that is done is up to the implementation ofℓ.
Bikeshedding names is appreciated :wink:, also alternative API suggestions.
How this meshes with AD
This is a bit tricky and I don't yet have a good API in mind. Related work is in
- https://github.com/gdalle/ImplicitDifferentiation.jl
- https://github.com/JuliaNonconvex/NonconvexUtils.jl
- https://github.com/tpapp/ImplicitDifferentiables.jl