gpytorch icon indicating copy to clipboard operation
gpytorch copied to clipboard

[bug] SVDKL demo error

Open dublinsky opened this issue 4 years ago • 8 comments

When running the demo code of SVDKL, I receive the following error at the last step: `(Epoch 1) Minibatch: 0% 0/196 [00:00<?, ?it/s]

/home/j/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/utils/cholesky.py:44: NumericalWarning: A not p.d., added jitter of 1.0e-06 to the diagonal warnings.warn(f"A not p.d., added jitter of {jitter_new:.1e} to the diagonal", NumericalWarning) /home/j/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/utils/cholesky.py:44: NumericalWarning: A not p.d., added jitter of 1.0e-05 to the diagonal warnings.warn(f"A not p.d., added jitter of {jitter_new:.1e} to the diagonal", NumericalWarning) /home/j/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/utils/cholesky.py:44: NumericalWarning: A not p.d., added jitter of 1.0e-04 to the diagonal warnings.warn(f"A not p.d., added jitter of {jitter_new:.1e} to the diagonal", NumericalWarning)


NotPSDError Traceback (most recent call last) in 1 for epoch in range(1, n_epochs + 1): 2 with gpytorch.settings.use_toeplitz(False): ----> 3 train(epoch) 4 test() 5 scheduler.step()

in train(epoch) 21 data, target = data.cuda(), target.cuda() 22 optimizer.zero_grad() ---> 23 output = model(data) 24 loss = -mll(output, target) 25 loss.backward()

~/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/module.py in call(self, *inputs, **kwargs) 28 29 def call(self, *inputs, **kwargs): ---> 30 outputs = self.forward(*inputs, **kwargs) 31 if isinstance(outputs, list): 32 return [_validate_module_outputs(output) for output in outputs]

in forward(self, x) 15 # This next line makes it so that we learn a GP for each feature 16 features = features.transpose(-1, -2).unsqueeze(-1) ---> 17 res = self.gp_layer(features) 18 return res 19

~/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/models/approximate_gp.py in call(self, inputs, prior, **kwargs) 79 if inputs.dim() == 1: 80 inputs = inputs.unsqueeze(-1) ---> 81 return self.variational_strategy(inputs, prior=prior, **kwargs)

~/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/variational/independent_multitask_variational_strategy.py in call(self, x, prior, **kwargs) 45 46 def call(self, x, prior=False, **kwargs): ---> 47 function_dist = self.base_variational_strategy(x, prior=prior, **kwargs) 48 if ( 49 self.task_dim > 0

~/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/variational/_variational_strategy.py in call(self, x, prior, **kwargs) 109 if not self.variational_params_initialized.item(): 110 prior_dist = self.prior_distribution --> 111 self.variational_distribution.initialize_variational_distribution(prior_dist) 112 self.variational_params_initialized.fill(1) 113

~/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/variational/cholesky_variational_distribution.py in initialize_variational_distribution(self, prior_dist) 51 self.variational_mean.data.copy_(prior_dist.mean) 52 self.variational_mean.data.add_(torch.randn_like(prior_dist.mean), alpha=self.mean_init_std) ---> 53 self.chol_variational_covar.data.copy_(prior_dist.lazy_covariance_matrix.cholesky().evaluate())

~/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in cholesky(self, upper) 960 (LazyTensor) Cholesky factor (triangular, upper/lower depending on "upper" arg) 961 """ --> 962 chol = self._cholesky(upper=False) 963 if upper: 964 chol = chol._transpose_nonbatch()

~/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs) 57 kwargs_pkl = pickle.dumps(kwargs) 58 if not _is_in_cache(self, cache_name, *args, kwargs_pkl=kwargs_pkl): ---> 59 return _add_to_cache(self, cache_name, method(self, *args, **kwargs), *args, kwargs_pkl=kwargs_pkl) 60 return _get_from_cache(self, cache_name, *args, kwargs_pkl=kwargs_pkl) 61

~/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in _cholesky(self, upper) 423 424 # contiguous call is necessary here --> 425 cholesky = psd_safe_cholesky(evaluated_mat, upper=upper).contiguous() 426 return TriangularLazyTensor(cholesky, upper=upper) 427

~/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/utils/cholesky.py in psd_safe_cholesky(A, upper, out, jitter, max_tries) 106 Number of attempts (with successively increasing jitter) to make before raising an error. 107 """ --> 108 L = _psd_safe_cholesky(A, out=out, jitter=jitter, max_tries=max_tries) 109 if upper: 110 if out is not None:

~/v/anaconda3/envs/ggp/lib/python3.6/site-packages/gpytorch/utils/cholesky.py in _psd_safe_cholesky(A, out, jitter, max_tries) 46 if not torch.any(info): 47 return L ---> 48 raise NotPSDError(f"Matrix not positive definite after repeatedly adding jitter up to {jitter_new:.1e}.") 49 50

NotPSDError: Matrix not positive definite after repeatedly adding jitter up to 1.0e-04.`

dublinsky avatar Nov 15 '21 08:11 dublinsky

Are you running the Jupyter notebook in the examples folder, or did you make any local changes?

In general, these errors aren't unexpected when optimizing GP models. Try lowering the learning rate, or trying a different initialization.

gpleiss avatar Nov 22 '21 14:11 gpleiss

Are you running the Jupyter notebook in the examples folder, or did you make any local changes?

In general, these errors aren't unexpected when optimizing GP models. Try lowering the learning rate, or trying a different initialization.

I make copy of the whole gpytorch project via git clone command, and run the jupyter notebook in the examples folder exactly, without any local changes, I am sure about that.

I have tried to set lr as 0.01, 0.001 and 0.0001, and obtain the same error. Any further hints? Huge thanks.

dublinsky avatar Nov 29 '21 02:11 dublinsky

Which example notebook are you running?

gpleiss avatar Dec 21 '21 01:12 gpleiss

Which example notebook are you running?

Hi,

I had the same error when running the example code: https://github.com/cornellius-gp/gpytorch/blob/master/examples/06_PyTorch_NN_Integration_DKL/Deep_Kernel_Learning_DenseNet_CIFAR_Tutorial.ipynb

LichuanRen avatar Jan 14 '22 04:01 LichuanRen

Hi @dublinsky,

have you been able to figure out what causes the warning?

Many thanks

AndreaBraschi avatar Jan 19 '22 19:01 AndreaBraschi

@LichuanRen I will take a look later today.

gpleiss avatar Jan 19 '22 19:01 gpleiss

Has this been fixed? I am still getting this error when trying to run this example.

agosztolai avatar Aug 17 '23 10:08 agosztolai

Hi @agosztolai,

I haven't worked through the example so can't really comment on that.

However, working with custom data and toy data I've noticed that if you're using Adam and NGD as optimisers, the error is very sensitive to a high learning rate of the NGD.

Another thing that has helped me avoid the error is increasing the precision of the inducing inputs and points to a torch.float64.

Hope this helps.

Andrea

AndreaBraschi avatar Aug 18 '23 12:08 AndreaBraschi