AbstractGPs.jl
AbstractGPs.jl copied to clipboard
Approximate Inference
Currently, we have
approx_posterior(approximation, fx, y, u)
As pointed out by @st-- and @rossviljoen in SparseGPs.jl, we should consider reducing this to a 3-arg function in which approximation contains u, since u is really a component of the approximation. This kind of thing would generalise more elegantly to what @rossviljoen is doing in SparseGPs.jl, as it will be natural in that case to put the variational parameters associated with q(u) inside approximation as well. More generally, there are approximate inference algorithms which don't involve pseudo-points, and it would be nice to generalise to them also.
So the new approx_posterior function for the saturated VFE approximation would be something like
approx_posterior(VFE(u), fx, y)
and for the unsaturated would be something like
approx_posterior(VFE(u, qu), fx, y)
(or something a bit like that).
I wonder whether we actually just want to keep the same function names and always pass an approximation argument?
For example,
posterior(approximation, fx, y)
logpdf(approximation, fx, y)
rand(rng, approximation, fx)
etc?
This would mean that users wouldn't have to learn a new API -- they'd just add an extra argument to the existing one. It would also create a nice way to think about the kinds of operations that we might like to implement on approximations, and provide a framework for characterising what kinds of operations different approximations are able to provide. For example, the vanilla version of the variational pseudo-point approximation doesn't provide a performant implementation of rand, but the pathwise-sampling version would. You could imagine a table of ticks and crosses characterising each of the approximations on offer.
@rossviljoen @willtebbutt is this sufficiently resolved by #194 ?
My inclination is to say no, because we're not completely satisfied with the result of #194 , although it's an improvement on what we had before. I'll refer future readers to this and subsequent comments https://github.com/JuliaGaussianProcesses/AbstractGPs.jl/pull/194#issuecomment-899860620
@willtebbutt could you summarise the remaining issues in here ?
Certainly. For both approximations that we've encountered so far (Titsias (2009) and Hensman (2013)), once the ApproxPosteriorGP has been produced, it's clear that we want to implement the regular AbstractGPs API on it.
The first question is what to do about the API for generating it. For Titsias (2009), something like
posterior(VFE(f(z)), f(x), y)
(or similar) makes sense, whereas for Hensman (2013) you could get away with something like
posterior(VFE(f(z), q))
because the approximate posterior is mediated by q(u).
Of course, the Titsias (2009) approximation is just the Hensman (2013) approximation with the optimal choice of q(u), so we could write the Titsias (2009) implementation as
posterior(VFE(f(z), f(x), y))
or something, and it would make sense. Maybe we should have done that...
The second question is what to do about the elbo. In both cases, you can make sense of something like
elbo(VFE(...), f(x), y)
potentially with some extra arguments.
The solution we went with was
posterior(VFE(f(z)), f(x), y)
elbo(VFE(f(z)), f(x), y)
for Titsias (2009), and presumably a similar thing will happen in SparseGPs for Hensman (2013).
This was a slightly hurried design choice for the sake of getting something that was an improvement on what we currently had.
I'm confident that we can find a better solution, just no one has found it yet.
I wonder whether something like
approx = VFE(f(z), f(x), y)
elbo(approx) # returns a scalar
approx_posterior = posterior(approx) # return an ApproxPosteriorGP
and
approx = VFE(f(z), f(x), y, q; config...) # config contains things about batch sizes etc.
elbo(rng, approx) # returns an estimator of the ELBO
approx_posterior_gp = posterior(approx) # returns an ApproxPosteriorGP
would make more sense? Still doesn't feel quite right though...
Reviving this discussion, another question is where should the computation/optimisation go? E.g. for SparseVariationalApproximation we need to optimise to find the optimal q(u), and currently this is left as an exercise for the user, and posterior(sva, lfx, y) is then super fast (and ignores the last two arguments entirely). In contrast, posterior(LaplaceApproximation(), lfx, y) actually computes the mode of the posterior. From a user perspective, it'd be nice if it was consistent (and easy...).
Yeah -- I at least think we ought to be able to do this in the deterministic-objective case (no minibatching, quadrature / exact reconstruction-term computation) at the very least. I've done something like this in ConjugateComputationVI.jl (which I should move over to here and align with what you've done with the Laplace approximation).
edit: if you've got particular ideas for the stochastic-objective case, I'd be interested to know what they are though.
I believe that this is now stale, so am closing.