GPflowOpt
GPflowOpt copied to clipboard
Issue with large dataset
I want to run BO on a large dataset. For this reason I am using SVGP and minibatching, however it still fails with EI acquisition. A minimal example:
import gpflow
from gpflowopt.acquisition import ExpectedImprovement
import numpy as np
n = 200000
d = 50
X_train = np.random.randn(n,d)
y_train = np.random.randn(n,1)
sgp = gpflow.svgp.SVGP(X_train, y_train, gpflow.kernels.Matern52(1), gpflow.likelihoods.Gaussian(), X_train[:500,:].copy(), minibatch_size=500)
ei = ExpectedImprovement(sgp)
This gives error on the last line:
InvalidArgumentError: Input to reshape is a tensor with 40000000000 values, but the requested shape has 1345294336 [[node Reshape_7 (defined at /home/puncochar/miniconda3/envs/mgr-work/lib/python3.6/site-packages/gpflowopt/transforms.py:138) ]]
basically an integer overflow.
The error points to line 138 in transforms.py
, where there is Yvar = tf.reshape(Yvar, [N * N, D])
.
I figured this is unnecesary and by passing full_cov=False
from build_predict(...)
in scaling.py
we can simplify the variance linear transformation for diagonal matrices without constructing the full matrix.
L = tf.cholesky(tf.square(tf.transpose(self.A)))
XT = tf.cholesky_solve(L, tf.transpose(Yvar))
return tf.transpose(XT)
After doing this, the code above works and I am subsequently able to run optimization without any errors.
If you like I can make a PR fixing it, because more people could run into this.