shogun implementing inference methods for sparse GP

Most of methods in the list will be implemented in the order.

inference for Sparse Gaussian process regression (based on JMLR 2005 "A unifying view of sparse approximate Gaussian process regression")
inference for Sparse Gaussian process binary classification. (based on NIPS 2007 "the generalized FITC approximation")
variational inference for Sparse Gaussian process binary classification and regression (based on AIStats 2009 "Variational learning of inducing variables in sparse Gaussian processes.")
Stochastic variational inference for Sparse Gaussian process binary classification and regression (based on UAI 2013 "Gaussian processes for big data.")
Doubly Stochastic variational inference for Sparse Gaussian process binary classification (based on ICML 2014 "Doubly Stochastic Variational Bayes for non-Conjugate Inference.")
Scalable Variational inference for Gaussian process binary classification (based on AIStats 2015 "Scalable Variational Gaussian Process Classification.")
GPLVM (based on NIPS 2004 Gaussian Process Latent Variable Models for visualization of High Dimensional Data)

Techniques may be useful for natural/normal stochastic gradient

Variance reduction for stochastic gradient optimization (NIPS 2013)
Variance Bayesian inference with stochastic search (ICML 2012)
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction (NIPS 2013)
An adaptive learning rate for stochastic variational inference (ICML 2013)
Deterministic Annealing for Stochastic Variational Inference (arXiv pre-print 2014)

Mar 23 '15 20:03 yorkerlin

@karlnapf take a look at this.

Mar 23 '15 20:03 yorkerlin

Good stuff @yorkerlin

A few comments: I would for now remove the arixived papers from the list. We do not know yet how well they really work, so we should wait for them to get peer reviewed.

All the additional tools are also very useful. I would also like to suggest another paper, that is very similar, but not specifically GP and variational inference, but general kernel methods. Very good stuff as it scales kernel methods up to NN performance both in time and accuracy: Scalable Kernel Methods via Doubly Stochastic Gradients http://arxiv.org/abs/1407.5599 For this, we would need to clean up the random kitchen sink implementation a bit. I dont know how interested you are in this, but it is certainly super general and useful for all of shoguns methods

Mar 25 '15 10:03 karlnapf

@karlnapf Indeed, the following methods use the same variational bound proposed by MK Titsias

variational inference for Sparse Gaussian process binary classification and regression (based on AIStats 2009 "Variational learning of inducing variables in sparse Gaussian processes.")
Stochastic variational inference for Sparse Gaussian process binary classification and regression (based on UAI 2013 "Gaussian processes for big data.")
Scalable Variational inference for Gaussian process binary classification (based on AIStats 2015 "Scalable Variational Gaussian Process Classification.")
GPLVM (based on NIPS 2004 Gaussian Process Latent Variable Models for visualization of High Dimensional Data)

Mar 27 '15 19:03 yorkerlin

@karlnapf The GPLVM for dimension reduction will be the last implemented method since we may need to warp or use some tricks in kernel objects since we need to compute the variational expectation wrt features of a general kernel matrix. For ARD kernels, I can get the close-form result. For other kernels, currently I am not sure whether we can do some numerical integration for multivariate Gaussian distribution.

Mar 27 '15 19:03 yorkerlin

@karlnapf As mentioned at Scalable Variational Gaussian Process Classification, the variational distribution can be non-Guassian. Currently, all KL methods use a Gaussian distribution as the variational distribution.

See Automated variational inference for Gaussian process models NIPS 2014. We can use a mixture of K Gaussian distributions as the variational distribution to gain more accurate approximation, where K is user-defined.

If time permitted, I may implement a simplified version of the mixture of Gaussian distributions as the variational distribution, which means the K covariance matrices are diagonal matrices.

Mar 27 '15 19:03 yorkerlin

Ok cool!

Mar 29 '15 12:03 karlnapf

Status Done:

(Batch) inference based on Titsias' bound for sparse GP regression (ref:[1]) https://github.com/shogun-toolbox/shogun/pull/2865

Working in progress (this June to early July )

(Batch) inference based on Titsias' bound for sparse GP classification (ref:[1] and [2])
(Batch) stochastic inference for sparse GP regression and classification(ref:[3], [7])
online/streaming inference for sparse GP regression and classification (ref:[3], [4], [7])

References: [1] AIStats 2009 "Variational learning of inducing variables in sparse Gaussian processes." [2] AIStats 2015 "Scalable Variational Gaussian Process Classification." [3] UAI 2013 "Gaussian processes for big data." [4] NIPS 2013 "Streaming variational bayes" [5] NIPS 2014 "Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models" [6] NIPS 2011 "Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities" [7] ICML 2015 A Unifying Framework of Anytime Sparse Gaussian Process Regression Models with Stochastic Variational Inference for Big Data [8] ICML 2015 Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

Jun 01 '15 15:06 yorkerlin

Great! I am back to double check and merge code now.

Still asking about benchmarks!

Jun 11 '15 08:06 karlnapf

All methods are implemented in Matlab. I will add the method in [3].

Feb 18 '17 13:02 yorkerlin

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Mar 02 '20 14:03 stale[bot]

shogun shogun copied to clipboard

implementing inference methods for sparse GP

shogun
shogun copied to clipboard