GPflowOpt icon indicating copy to clipboard operation
GPflowOpt copied to clipboard

Issue with large dataset

Open PuncocharM opened this issue 5 years ago • 0 comments

I want to run BO on a large dataset. For this reason I am using SVGP and minibatching, however it still fails with EI acquisition. A minimal example:

import gpflow
from gpflowopt.acquisition import ExpectedImprovement
import numpy as np
n = 200000
d = 50
X_train = np.random.randn(n,d)
y_train = np.random.randn(n,1)
sgp = gpflow.svgp.SVGP(X_train, y_train, gpflow.kernels.Matern52(1), gpflow.likelihoods.Gaussian(), X_train[:500,:].copy(), minibatch_size=500)
ei = ExpectedImprovement(sgp)

This gives error on the last line:

InvalidArgumentError: Input to reshape is a tensor with 40000000000 values, but the requested shape has 1345294336 [[node Reshape_7 (defined at /home/puncochar/miniconda3/envs/mgr-work/lib/python3.6/site-packages/gpflowopt/transforms.py:138) ]]

basically an integer overflow.

The error points to line 138 in transforms.py, where there is Yvar = tf.reshape(Yvar, [N * N, D]).

I figured this is unnecesary and by passing full_cov=False from build_predict(...) in scaling.py we can simplify the variance linear transformation for diagonal matrices without constructing the full matrix.

L = tf.cholesky(tf.square(tf.transpose(self.A)))
XT = tf.cholesky_solve(L, tf.transpose(Yvar))
return tf.transpose(XT)

After doing this, the code above works and I am subsequently able to run optimization without any errors.

If you like I can make a PR fixing it, because more people could run into this.

PuncocharM avatar Jan 28 '20 22:01 PuncocharM