Natural Gradients + Monte Carlo VI
There's been quite a bit of interesting work recently looking at natural gradients for variational inference with exponential family q-distributions, with non-conjugate / non-exponential family likelihoods / priors. See [1] (applied to GPs, but important bits aren't really GP-specific) and [2]. These turn out to be really quite straightforward to implement, so would be a great target for us. As a starting point, you could imagine extending our current mean field implementation to employ natural gradient descent in the parameters of the diagonal Gaussian q-distribution.
There's even work moving slightly beyond exponential family distributions now [3], but this is quite early work. Might be nice to have though.
[1] - Salimbeni, Hugh, Stefanos Eleftheriadis, and James Hensman. "Natural gradients in practice: Non-conjugate variational inference in Gaussian process models." arXiv preprint arXiv:1803.09151 (2018). [2] - Khan, Mohammad Emtiyaz, and Didrik Nielsen. "Fast yet simple natural-gradient descent for variational inference in complex models." 2018 International Symposium on Information Theory and Its Applications (ISITA). IEEE, 2018. [3] - Lin, Wu, Mohammad Emtiyaz Khan, and Mark Schmidt. "Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations." arXiv preprint arXiv:1906.02914 (2019).
Hi Thanks for being interesting in our papers. We have a new ICML 2021 paper on natural-gradient descent (including natural-gradient VI and Newton-like methods) on structured Gaussian and its mixtures.
Lin, Wu, et al. "Tractable structured natural gradient descent using local parameterizations." arXiv preprint arXiv:2102.07405 (2021).
Wu
This is a little late for the discussion, but is there any evidence/personal experience that natural gradient descent is more robust/faster than regular ADVI?
Natural gradient descent is just a way of doing, well, gradient descent on parameters of distributions:) ADVI is a very specific VI method. I'm not sure these are even comparable?
I think many consider NGVI as a competitor to good ol' ADVI. At least the literature certainly make it look like that.