neural-tangents icon indicating copy to clipboard operation
neural-tangents copied to clipboard

Why is the covariance a 2D matrix?

Open LeonhardStorm opened this issue 1 year ago • 1 comments

Hi there, thanks for making and maintaining this excellent project!

I was wondering why the covariance output of the predict function is a Matrix in the shape of the kernel function. As far as I can tell, in all of the examples only the diagonal values of this are used, so what do the other values represent? Are they relevant/useful at all?

Thanks!

LeonhardStorm avatar Aug 15 '22 09:08 LeonhardStorm

Indeed in examples and visualizations we have only used the diagonal entires (marginal variances), but in general outputs are described by a Gaussian process (GP) with non-zero non-diagonal covariance entries (see covariance expressions in eq. 13, 15, 16 in https://arxiv.org/pdf/1902.06720.pdf), hence we return the full covariance matrix of this GP. As for any GP, the off-diagonal entries just represent the covariance between outputs of your GP at two different input points.

If you want to sample outputs on x_test from your (posterior, post-training) GP, you would need the full covariance on x_test for it. You may also need the full covariance if, for example, you want to estimate / tune the predictive probability p(y_test | x_train, y_train, x_test) (pdf of the GP with the posterior mean and covariance on x_test that you obtain from predict). It can also be used in ensembling, if you use inverse-variance weighting (https://en.wikipedia.org/wiki/Inverse-variance_weighting#Multivariate_Case) to generate ensemble predictions on x_test given several GPs (see section E in https://arxiv.org/pdf/2007.15801.pdf).

romanngg avatar Aug 15 '22 17:08 romanngg