Turing.jl icon indicating copy to clipboard operation
Turing.jl copied to clipboard

Using the score function estimator as gradient for VI

Open arnauqb opened this issue 1 year ago • 9 comments

Hello,

Imagine we have a model that looks like this

@model function m(data)
    θ ~ Normal(0, 1)
    x ~ f(θ)
    data ~ Normal(x, 1.0)
end

Where f is some let's say physics simulator that may not be differentiable but that I have implemented as a custom distribution with logpdf defined.

In order to fit the model using VI, I want to use the score-function estimator to estimate the gradient throughout the simulator (see e.g., https://arxiv.org/pdf/1906.10652).

The functionality I need is to sample ${\theta_i} \sim q_\phi(\theta)$ where $q$ is the variational distribution with parameters $\phi$ and compute

$$ \nabla_\phi\log q_\phi (\theta) f(\theta) $$

Looking at the internals of Turing.jl, it seems that the AD backends are defined in https://github.com/tpapp/LogDensityProblemsAD.jl however I am not sure how to reach $\nabla_\phi \log q_\phi$ when I am implementing a custom derivative of $f$.

Edit:

I think Pyro deals with this automatically for non-parameterizable variables under the hood: https://pyro.ai/examples/svi_part_iii.html

arnauqb avatar Jul 13 '24 14:07 arnauqb

Unfortunately, we don't have a score function gradient estimation based BBVI implemented for Turing yet (@Red-Portal @torfjelde correct me if I am wrong).

But I think AdvancedVI.jl has a good interface to get this implemented.

If we are working with ELBO, then $$f(\mathbf{z}) = \log p(\mathbf{z}) - \log q_\phi(\mathbf{z})$$ in $$\nabla_\phi\log q_\phi (\mathbf{z}) f(\mathbf{z})$$ where $\mathbf{z}$ is the latent variables (in this case, θ, z in the Turing model). I modify the letter representing latent variables from θ to z to reduce ambiguity. ref

A Turing model wrapped in LogDensityFunction will give you access to $\log p(\mathbf{z})$, as it follows LogDensityProblems.jl interface. $q_\phi$ is usually a transformed distribution with Bijectors.jl interface. For instance, meanfield is defined as https://github.com/TuringLang/Turing.jl/blob/29a134245b2499d59fa992420eba37ab2b9f5945/src/variational/advi.jl#L71-L98 It should be straightforward to compute $\log q_\phi(\mathbf{z})$ and also, this function should be differentiable wrt to the variational parameters.

So to compute the gradient estimate, you can create a customized AdvancedVI.AbstractVariationalObjective and implement estimate_gradient!. An example is the reparametrization gradient ELBO.

I know this seems daunting, do reach out if you are interested @arnauqb.

c.f. @Red-Portal, @torfjelde, @yebai

sunxd3 avatar Jul 18 '24 14:07 sunxd3

Yes, we currently don't support the score gradient because it is generally inferior to the parameterization gradient, and discrete variables can be marginalized in closed form. However, having legit non-differentiable elements as in SBI was not something that I thought of before.

As mentioned by @sunxd3 , we currently don't implement the score gradient, so you'll have to implement it yourself. You can use the v0.3.0 infrastructure, but it's not officially out yet so it's subject to changes at any moment. v0.2.0 is more primitive but could be easier to work with since it's simpler. It is also the version currently shipped with Turing.

However, this actually makes a good point that, given how big Julia's SciML community is, maybe we should think about this use-case seriously. I'll give some thought about this for v0.3.0

Red-Portal avatar Jul 18 '24 17:07 Red-Portal

Thanks @sunxd3 and @Red-Portal for the replies.

I have been able to implement a working score-based estimator with the current master of AdvancedVI.jl. I can submit a PR if its of interest to have it as an example or part of the main code.

arnauqb avatar Jul 18 '24 20:07 arnauqb

@arnauqb By the current master, do you mean v0.3.0 version? Either way, if you submit a PR, I would be happy to take a look. Ideally, it would be best to mix score and reparameterization gradients, but I am currently unsure how to do it. I might take a look at how Pyro does it in more detail.

Red-Portal avatar Jul 18 '24 21:07 Red-Portal

leaving here for reference:

https://github.com/TuringLang/AdvancedVI.jl/pull/72

As I mention in the PR, right now I define a new variational objective and all the model is optimized using the score estimator. However, ideally we would like to use the pathwise where possible.

@sunxd3 I'm not very familiar with the DynamicPPL syntax, but I noticed that Turing supports mixed autodiff. Would it be possible to do something similar for the score estimator? That is, the user could specify in the model declaration that a certain logpdf gradient needs to be estimated using the score estimator. I am not sure how one can get access to the logpdf of the samples that are being passed from inside a ChainRulesCore declaration.

arnauqb avatar Jul 19 '24 13:07 arnauqb

also this package may be relevant to achieve this: https://github.com/JuliaDecisionFocusedLearning/DifferentiableExpectations.jl @gdalle

arnauqb avatar Jul 19 '24 13:07 arnauqb

Thanks @arnauqb. Great efforts! Let me take a look.

sunxd3 avatar Jul 19 '24 13:07 sunxd3

Yeah that package of mine is still in the early stages (definitely use the version on main rather than the last one we registered), but the goal is to implement score function / reinforce gradients as well as reparametrization. Any feedback and suggestions are appreciated! @BatyLeo

gdalle avatar Jul 19 '24 15:07 gdalle

Also I'd be thrilled to figure out if we can make this more Turing-friendly

gdalle avatar Jul 19 '24 15:07 gdalle

This is now part of the planned release of v0.3 of AdvancedVI.

Red-Portal avatar Nov 17 '24 06:11 Red-Portal