AdvancedVI.jl Rethinking AdvancedVI

Alright! It's time to seriously take care of AdvancedVI :D

Here are some of the things we talked about in the meeting back in October:

There should be two distinct methods of optimization when the variational distribution is given as a function (like update_q) or a distribution from which the parameters change.
Hyperparameter optimization should be nicely implemented, a proposition was :

    makelogπ(logπ, ::Nothing) = logπ
    makelogπ(logπ, hyperparams) = logπ(hyperparams)
    function vi(..., logπ; hyperparams = nothing)
        ...
        while not_converged
            logjoint = makelogπ(logπ, hyperparams)
            for i in 1:n_inner
                ...
            end
        end
    end

We should condensate the updates on the variational parameters via a more "atomic" step! function

And here are some more personal points (disclaimer: I will be happy to take care of these different points)

I don't think the current ELBO approach is good, the ELBO can always be splitted between an entropy term (depending only of the distribution) and an expectation term over the log joint. Most VI methods take advantage of this by computing the entropy gradient analytically (and smartly!), see "Doubly Stochastic Variational Inference" by Titias for instance. My proposition would be to split the gradient into two parts (grad_entropy + grad_expeclog), where one can specialize given the problem.
I would personally argue that update_q only makes sense with the current obsolete implementation using distributions with immutable fields like TuringMvNormal. See again Titsias using the reparametrization trick.

Feb 12 '21 10:02 theogf

Hi, is there any update on a complete rewrite of AdvancedVI? Or even an expected time frame for release?

Jun 06 '22 15:06 Red-Portal

Hey, there is no update and I would say that this has gone stale. I don't have the bandwidth for it anymore and neither does @torfjelde (I guess), so unless someone takes over...

Jun 06 '22 16:06 theogf

Hi @theogf , that's sad news. Then at the given moment, the VI ecosystem of Turing will not see much improvement? I heard early this year that @torfjelde is currently improving the turning model APIs, which I think will be quite coupled to anything done to AdvancedVI.jl is there any timeline on that?

Jun 06 '22 17:06 Red-Portal

I really hope @torfjelde has the time for it (we haven't talked in a while). If the package becomes easier to work with I would definitely be happy to add a couple of algorithms like SVGD and others. But I generally think that a revamping is very necessary. The ML ecosystem evolved a lot and there are now new solutions like ParametersHandling.jl for problems we had here.

Jun 06 '22 17:06 theogf

Is there a straightforward way to deal with the covariance of a full-rank multivariate normal variational family though? I have been using AdvancedVI.jl as the basis of one of my recent research projects, but couldn't come up with a way to elegantly unpack/repack the parameters of the covariance. I think taking gradients independently for each symbolic variable a la Flux.jl could be a solution. Any thoughts on this?

Jun 06 '22 17:06 Red-Portal

You should have a look at ParameterHandling.jl and the positive_definite function. However, there is no specific optimization for VI, but that's a topic on its own!

Jun 06 '22 17:06 theogf

@theogf That looks great. I would really like to know about the future/current state of Turing.jl's model API before doing anything though.

Jun 07 '22 23:06 Red-Portal

I will start pursuing a PhD starting this Fall, and this might give me some bandwidth to work full-time on AdvancedVI.jl I personally think there is a lot of potential for it being a research platform for cutting edge VI research. There are some things that we are missing and need some major work.

[ ] Support a diverse set of variational families like the convex update and structured normalizing flows for example. These need to inspect the probabilistic program.
[ ] User-defined structured variational families. I think it would be useful to use Turing to describe a probabilistic program of variational families. This would need additional functionalities like inferring the variational parameters that do not contain a prior. Not sure if this is easy to do with Turing at the moment.
[ ] Recently proposed diagnostics for VI.
[ ] A way to express factorizable likelihoods for minibatching and amortized inference.

@theogf could you list of changes that you planned to introduce into AdvancedVI? I might be able to pick them up at some point.

Jun 09 '22 17:06 Red-Portal

Hey! I'm back now; been away for the past 4 months, so sorry for not being responsive here.

Then at the given moment, the VI ecosystem of Turing will not see much improvement? I heard early this year that @torfjelde is currently improving the turning model APIs, which I think will be quite coupled to anything done to AdvancedVI.jl is there any timeline on that?

So it depends on what we're talking about here.

The work I'm doing on the model-side of Turing.jl will be very useful for any interaction AdvancedVI.jl wants to have with Turing.jl-models, e.g. perform VI on a Turing.jl model, use a Turing.jl model to define a variational approximation, etc. But solely for AdvancedVI.jl, i.e. ignoring any relation to the rest of Turing.jl-ecosystem, we're still not happy with what we have set-up this far; the general API needs to improve, as partially outlined by @theogf above. There are also some significant improvements in the ecosystem that we might want to take advantage of here in AdvancedVI.jl:

[ ] https://github.com/JuliaDiff/AbstractDifferentiation.jl to make us agnostic to the AD backend in use.
[ ] Optimizers. Back when me and @theogf were looking at what to do with this, we were still awaiting some modularization of the Flux-ecosystem since many of the optimizers from that ecosystem are what we'll end up using in VI too. It seems like things have now improved significantly: https://github.com/FluxML/Optimisers.jl looks much better now + we might want to have a look at how https://github.com/FluxML/FluxTraining.jl does things.
[ ] https://github.com/FluxML/Functors.jl or https://github.com/invenia/ParameterHandling.jl should be useful for immutable models, etc.
[ ] https://github.com/JuliaMath/ChangesOfVariables.jl and https://github.com/JuliaMath/InverseFunctions.jl might be useful to avoid depending directly on Bijectors.jl.

And so on.

It requires a bit more thought and outlining what we want here though, but I'm keen on getting something rolling now!:)

Jun 10 '22 13:06 torfjelde

Hi @torfjelde , nice to have you back. If you haven't noticed, I'm one of the guys that was on the Turing.jl salespitch at the University of Liverpool.

Some additional thoughts: People have been talking about SVGD in this repo for quite some time, but I don't think it will make a good fit here. Its algorithmic structure is quite different from BBVI/MCVI such that I don't see good abstraction opportunities. And given that we'll not see a shortage of variational particle methods any time soon, I think it will be good to have a separate package like AdvancedParticles.jl or something.

Jun 10 '22 14:06 Red-Portal

Some additional thoughts: People have been talking about SVGD in this repo for quite some time, but I don't think it will make a good fit here.

I don't agree, the representation is different but just as relevant.

Even if we move it to a different package, we would still need a common API. So it's probably preferable to think of this in one package before starting to split things up.

Jun 10 '22 14:06 theogf

@theogf Given that you already have #25 open, do you plan on coming back to #25 or how should we attack rewriting AdvancedVI?

Jun 10 '22 15:06 Red-Portal

No I think it's probably better to start back from scratch, you can eventually take ideas from there if you want

Jun 10 '22 15:06 theogf

Okay. Thanks, @theogf @torfjelde the discussions were really helpful.

Jun 10 '22 15:06 Red-Portal

Hi @torfjelde , I'm thinking about how to restructure the overall project.

I'm thinking to restructure the project as:

estimators/
diagonstics/
algorithms/

Currently, AdvancedVI.jl has a separate notion of a variational objective (implemented in objectives.jl) and an algorithm (implemented in advi.jl; I'm proposing to change this terminology into esimator) for estimating the objective's gradient, but I don't think this distinction is necessary. After all, most of the gradient estimators proposed in the literature target specific objectives, so I think an objective should be an attribute of an estimator rather its own object.

Under algorithms, I'm planning to put higher-level algorithms that utilize the output of the estimators. For example, stochastic varianced-reduced gradient descent could be one, or methods of combining the output of multiple estimators like [1,2] could also be considered.

For diagonstics, I'm thinking of the various VI-specific diagnostics that have been proposed over the years, like the ones in [3], and the R-hat diagnostics [4]. Though [4] would need an online version of R-hat. I think I saw some heresay about this but not sure what happened on that front.

[1] "A Rule for Gradient Estimator Selection, with an Application to Variational Inference," https://arxiv.org/abs/1911.01894 [2] "Using Large Ensembles of Control Variates for Variational Inference," https://arxiv.org/abs/1810.12482 [3] "Validated Variational Inference via Practical Posterior Error Bounds," http://proceedings.mlr.press/v108/huggins20a.html [4] "Robust, Accurate Stochastic Optimization for Variational Inference," https://arxiv.org/abs/2009.00666

Feb 25 '23 20:02 Red-Portal

Hi @Red-Portal, it looks like a sensible plan. I suggest we keep things simple until there is a genuine need for generalisation. For example, estimators and algorithms can be kept the same if they are always coupled in practice.

Some diagnostics are definitely helpful, but this is likely a challenging area as we don't have good ways of checking convergence from the VI approximation to the true target. One way is to run expensive MCMC simulations and compute the divergence between VI approximation and MCMC samples. But we don't have guarantees that MCMC converges either.

For a concrete start, maybe you can focus on refactoring the current algorithms to improve clarity, documentation, and design consistency. We can add new algorithms or diagnostics at an advanced project stage.

Feb 27 '23 17:02 yebai

Hi @yebai ,

For a concrete start, maybe you can focus on refactoring the current algorithms to improve clarity, documentation, and design consistency. We can add new algorithms or diagnostics at an advanced project stage.

Absolutely! With the talk around diagnostics and algorithms, I wanted to illustrate the potential uses of the new structure. The actual content would be a long-term goal, if feasible.

I'll start with refactoring the existing functionalities.

Feb 27 '23 18:02 Red-Portal

Hi @yebai @torfjelde ,

What is the current policy about LogDensityProblems.jl ? It seems AdvancedHMC.jl chose to go with it. Should AdvancedVI.jl also follow suite?

Mar 07 '23 20:03 Red-Portal

That sounds good.

Mar 07 '23 20:03 yebai