botorch
botorch copied to clipboard
[Bug] different slice behavior of mvns calculation in gpytorch.posterior method when input training data size of model are 200 and 600
🐛 Bug
To reproduce
** Code snippet to reproduce **
Sorry to say code repo is too much to provide.
I am using a DKL-GP model, which just add few nn layers before GP layer. The DKL-GP model is inherited from BatchedMultiOutputGPyTorchModel and ExactGP.
The stack trace happens when I feed 600x19 -size training data into model. but when I just change training data size to 200x19. It disappear. I dont know what's going on here.
The difference between my two comparative try is only training data size, one is 200X19 and another is 600x19.
** Stack trace/error message **
Traceback (most recent call last):
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/botorch/optim/optimize.py", line 613, in optimize_acqf_discrete
max_batch_size=max_batch_size,
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/botorch/optim/optimize.py", line 646, in _split_batch_eval_acqf
return torch.cat([acq_function(X_) for X_ in X.split(max_batch_size)])
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/botorch/optim/optimize.py", line 646, in <listcomp>
return torch.cat([acq_function(X_) for X_ in X.split(max_batch_size)])
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/botorch/utils/transforms.py", line 301, in decorated
return method(cls, X, **kwargs)
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/botorch/utils/transforms.py", line 258, in decorated
output = method(acqf, X, *args, **kwargs)
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/botorch/acquisition/multi_objective/monte_carlo.py", line 336, in forward
posterior = self.model.posterior(X)
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/botorch/models/gpytorch.py", line 365, in posterior
for t in output_indices
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/botorch/models/gpytorch.py", line 365, in <listcomp>
for t in output_indices
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/gpytorch/lazy/lazy_tensor.py", line 2201, in __getitem__
res = self._getitem(row_index, col_index, *batch_indices)
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/gpytorch/lazy/sum_lazy_tensor.py", line 36, in _getitem
results = [lazy_tensor._getitem(row_index, col_index, *batch_indices) for lazy_tensor in self.lazy_tensors]
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/gpytorch/lazy/sum_lazy_tensor.py", line 36, in <listcomp>
results = [lazy_tensor._getitem(row_index, col_index, *batch_indices) for lazy_tensor in self.lazy_tensors]
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py", line 140, in _getitem
new_kernel = self.kernel.__getitem__(batch_indices)
File "/export1/Workspace/liqiang/tool/anaconda3/envs/socgen-nextorch/lib/python3.7/site-packages/gpytorch/kernels/kernel.py", line 434, in __getitem__
new_kernel._parameters[param_name].data = param.__getitem__(index)
IndexError: too many indices for tensor of dimension 1
Expected Behavior
System information
Please complete the following information:
- BoTorch Version 0.6.4
- GPyTorch Version 1.6.0
- PyTorch Version 1.11.0+cpu
- Computer OS :RHEL 6.10
Additional context
In 600x19 training-data-size case.
when I use optimize_acqf_discreate to find next optimal point. it going inside below posterior calculation of forward
method,
https://github.com/pytorch/botorch/blob/a675968d1a64849938ec935e4a619bf984a33637/botorch/acquisition/multi_objective/monte_carlo.py#L335-L338
when it execute to L360, I found the lazy_covariance_matrix of mvn
is NonLazyTensor in the "200x19" case, but is SumLazyTensor in the 600x19 case. I dont know the difference between this two class. Is it the key leading to error occurs?
and why the IndexError happens only in 600x 19 case?
https://github.com/pytorch/botorch/blob/a675968d1a64849938ec935e4a619bf984a33637/botorch/models/gpytorch.py#L356-L366
Thanks for flagging this. I believe this could be indexing bug on the gpytorch side of things related to lazy kernel evaluation. To check this, can you try wrapping your code in the following context:
with gpytorch.settings.max_eager_kernel_size(1000):
...
Here 1000 specifies the number of data points until which kernels are eagerly evaluated. If this works with the 600x19 case then we know that that is indeed the issue.
Also, what botorch and gpytorch versions are you using?
Thanks for so quick response ! I will try to revise this setting and updates results later. Can I just directly revise here? https://github.com/cornellius-gp/gpytorch/blob/45e560c3417cb970c3a402f8a1a92f87b733e470/gpytorch/settings.py#L401-L409
BoTorch Version 0.6.4 GPyTorch Version 1.6.0
It works ! I just directly revise https://github.com/cornellius-gp/gpytorch/blob/45e560c3417cb970c3a402f8a1a92f87b733e470/gpytorch/settings.py#L409 from 512 to 1000. then index error disappear.
Yeah so it's an indexing issue with lazy tensors then, we'll need to investigate this on the gpytorch end.
If you change https://github.com/cornellius-gp/gpytorch/blob/45e560c3417cb970c3a402f8a1a92f87b733e470/gpytorch/settings.py#L409 you're just modifying the defaults. Using the context manager you can do that in a local context without touching the gpytorch code.
Got it ! Thank you so much.
Close this issue now.
reopening this until the issue is fixed upstream
This should be fixed. See https://github.com/pytorch/botorch/issues/1291
Not sure this is "fixed" rather than being avoided - I think the underlying issue is still the long-standing gpytorch issue https://github.com/cornellius-gp/gpytorch/issues/1853. But should be ok to close this BoTorch issue here.