AbstractGPs.jl icon indicating copy to clipboard operation
AbstractGPs.jl copied to clipboard

Approximate Inference

Open willtebbutt opened this issue 4 years ago • 8 comments
trafficstars

Currently, we have

approx_posterior(approximation, fx, y, u)

As pointed out by @st-- and @rossviljoen in SparseGPs.jl, we should consider reducing this to a 3-arg function in which approximation contains u, since u is really a component of the approximation. This kind of thing would generalise more elegantly to what @rossviljoen is doing in SparseGPs.jl, as it will be natural in that case to put the variational parameters associated with q(u) inside approximation as well. More generally, there are approximate inference algorithms which don't involve pseudo-points, and it would be nice to generalise to them also.

So the new approx_posterior function for the saturated VFE approximation would be something like

approx_posterior(VFE(u), fx, y)

and for the unsaturated would be something like

approx_posterior(VFE(u, qu), fx, y)

(or something a bit like that).

willtebbutt avatar Jul 30 '21 21:07 willtebbutt

I wonder whether we actually just want to keep the same function names and always pass an approximation argument?

For example,

posterior(approximation, fx, y)
logpdf(approximation, fx, y)
rand(rng, approximation, fx)

etc?

This would mean that users wouldn't have to learn a new API -- they'd just add an extra argument to the existing one. It would also create a nice way to think about the kinds of operations that we might like to implement on approximations, and provide a framework for characterising what kinds of operations different approximations are able to provide. For example, the vanilla version of the variational pseudo-point approximation doesn't provide a performant implementation of rand, but the pathwise-sampling version would. You could imagine a table of ticks and crosses characterising each of the approximations on offer.

willtebbutt avatar Aug 01 '21 21:08 willtebbutt

@rossviljoen @willtebbutt is this sufficiently resolved by #194 ?

st-- avatar Aug 23 '21 10:08 st--

My inclination is to say no, because we're not completely satisfied with the result of #194 , although it's an improvement on what we had before. I'll refer future readers to this and subsequent comments https://github.com/JuliaGaussianProcesses/AbstractGPs.jl/pull/194#issuecomment-899860620

willtebbutt avatar Aug 25 '21 19:08 willtebbutt

@willtebbutt could you summarise the remaining issues in here ?

st-- avatar Aug 26 '21 07:08 st--

Certainly. For both approximations that we've encountered so far (Titsias (2009) and Hensman (2013)), once the ApproxPosteriorGP has been produced, it's clear that we want to implement the regular AbstractGPs API on it.

The first question is what to do about the API for generating it. For Titsias (2009), something like

posterior(VFE(f(z)), f(x), y)

(or similar) makes sense, whereas for Hensman (2013) you could get away with something like

posterior(VFE(f(z), q))

because the approximate posterior is mediated by q(u).

Of course, the Titsias (2009) approximation is just the Hensman (2013) approximation with the optimal choice of q(u), so we could write the Titsias (2009) implementation as

posterior(VFE(f(z), f(x), y))

or something, and it would make sense. Maybe we should have done that...

The second question is what to do about the elbo. In both cases, you can make sense of something like

elbo(VFE(...), f(x), y)

potentially with some extra arguments.

The solution we went with was

posterior(VFE(f(z)), f(x), y)
elbo(VFE(f(z)), f(x), y)

for Titsias (2009), and presumably a similar thing will happen in SparseGPs for Hensman (2013).

This was a slightly hurried design choice for the sake of getting something that was an improvement on what we currently had.

I'm confident that we can find a better solution, just no one has found it yet.

willtebbutt avatar Aug 26 '21 10:08 willtebbutt

I wonder whether something like

approx = VFE(f(z), f(x), y)
elbo(approx) # returns a scalar
approx_posterior = posterior(approx) # return an ApproxPosteriorGP

and

approx = VFE(f(z), f(x), y, q; config...) # config contains things about batch sizes etc.
elbo(rng, approx) # returns an estimator of the ELBO
approx_posterior_gp = posterior(approx) # returns an ApproxPosteriorGP

would make more sense? Still doesn't feel quite right though...

willtebbutt avatar Aug 26 '21 10:08 willtebbutt

Reviving this discussion, another question is where should the computation/optimisation go? E.g. for SparseVariationalApproximation we need to optimise to find the optimal q(u), and currently this is left as an exercise for the user, and posterior(sva, lfx, y) is then super fast (and ignores the last two arguments entirely). In contrast, posterior(LaplaceApproximation(), lfx, y) actually computes the mode of the posterior. From a user perspective, it'd be nice if it was consistent (and easy...).

st-- avatar Mar 21 '22 13:03 st--

Yeah -- I at least think we ought to be able to do this in the deterministic-objective case (no minibatching, quadrature / exact reconstruction-term computation) at the very least. I've done something like this in ConjugateComputationVI.jl (which I should move over to here and align with what you've done with the Laplace approximation).

edit: if you've got particular ideas for the stochastic-objective case, I'd be interested to know what they are though.

willtebbutt avatar Mar 23 '22 20:03 willtebbutt

I believe that this is now stale, so am closing.

willtebbutt avatar Sep 15 '23 19:09 willtebbutt