botorch
botorch copied to clipboard
[Bug] condition_on_observation with SaasFullyBayesianGP
🐛 Bug
With the risk of looking a bit silly, I don't think conditioning on additional observations works when using fully Bayesian GPs out of the box.
To reproduce
The SAASBO tutorial works fine to reproduce. ** Code snippet to reproduce **
import os
import matplotlib.pyplot as plt
import numpy as np
import torch
from botorch import fit_fully_bayesian_model_nuts
from botorch.acquisition import qExpectedImprovement
from botorch.models.fully_bayesian import SaasFullyBayesianSingleTaskGP
from botorch.models.transforms import Standardize
from botorch.optim import optimize_acqf
from botorch.test_functions import Branin
from torch.quasirandom import SobolEngine
SMOKE_TEST = os.environ.get("SMOKE_TEST")
tkwargs = {"device": torch.device("cuda:1" if torch.cuda.is_available() else "cpu"), "dtype": torch.double}
WARMUP_STEPS = 512 if not SMOKE_TEST else 32
NUM_SAMPLES = 256 if not SMOKE_TEST else 16
THINNING = 16
train_X = torch.rand(10, 4, **tkwargs)
test_X = torch.rand(5, 4, **tkwargs)
train_Y = torch.sin(train_X[:, :1])
test_Y = torch.sin(test_X[:, :1])
gp = SaasFullyBayesianSingleTaskGP(train_X=train_X, train_Y=train_Y)
fit_fully_bayesian_model_nuts(
gp, warmup_steps=WARMUP_STEPS, num_samples=NUM_SAMPLES, thinning=THINNING, disable_progbar=True
)
with torch.no_grad():
posterior = gp.posterior(test_X)
# Shape check
print(gp.covar_module.base_kernel.lengthscale.shape) # 16 x 1 x 4
cond_X = torch.rand(5, 4, **tkwargs) # 5 x 4
cond_Y = torch.sin(cond_X[:, :1]) # 5 x 1
# RuntimeError
gp.condition_on_observations(cond_X, cond_Y)
# and accounting for the batch shape does not work, either
cond_X = torch.rand(5, 4, **tkwargs).repeat(16, 1, 1) # 16 x 5 x 4
cond_Y = torch.sin(cond_X[:, :1]).repeat(16, 1, 1) # 16 x 5 x 1
gp.condition_on_observations(cond_X, cond_Y)
# Please make sure it does not require any external dependencies
** Stack trace/error message ** For both cases, this occurs:
gpytorch/models/exact_prediction_strategies.py:131, in DefaultPredictionStrategy.get_fantasy_strategy(self, inputs, targets, full_inputs, full_targets, full_output, **kwargs)
127 full_mean, full_covar = full_output.mean, full_output.lazy_covariance_matrix
129 batch_shape = full_inputs[0].shape[:-2]
--> 131 full_mean = full_mean.view(*batch_shape, -1)
132 num_train = self.num_train
134 # Evaluate fant x train and fant x fant covariance matrices, leave train x train unevaluated.
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Expected Behavior
Conditioning on the new observations for every constituent GP in the FBGB.
My two cents on the issue
Notably, creating a BatchedMultiOutputModel with num_mcmc_samples batches of train inputs and targets handles this without issue. The issue seems to appear here: https://github.com/cornellius-gp/gpytorch/blob/master/gpytorch/models/exact_gp.py#L220, where the output is of shape num_mcmc_samples x num_mcmc_samples x num_inputs (should probably be num_mcmc_samples x num_inputs.
System information
Please complete the following information:
- Ubuntu 18.04
Hmm yeah I'm not surprised, I don't think we've tried / tested this before. We'll take a look
Thanks! I'm trying to sort something out on my end too, but it may be a bit patchy if one is to avoid GPyTorch.
Yeah we may well have to change something on the gpytorch end, but that's ok.
So, I have a suggested solution to this problem that seems to be relatively pain-free. However, it doesn't quite fit within the BOTorch ecosystem, as it involves duplicating the training data in the FBGP and moving the unsqueeze(MCMC_DIM) from the forward to the posterior call. All in all, it basically looks like this (inside SaasFullyBayesianSingleTaskGP):
CTRL+F "DIFF" to see the diffs =)
def __init__(self, ....):
# Other code...
# DIFF - NEW ADDITION - duplicating the training data
train_X = train_X.unsqueeze(0).expand(num_mcmc_samples, train_X.shape[0], -1)
train_Y = train_Y.unsqueeze(0).expand(num_mcmc_samples, train_Y.shape[0], -1)
self._num_outputs = train_Y.shape[-1]
X_tf, Y_tf, _ = self._transform_tensor_args(X=train_X, Y=train_Y)
super().__init__(
train_inputs=X_tf, train_targets=Y_tf, likelihood=GaussianLikelihood()
)
def forward(self, X: Tensor) -> MultivariateNormal:
"""
Unlike in other classes' `forward` methods, there is no `if self.training`
block, because it ought to be unreachable: If `self.train()` has been called,
then `self.covar_module` will be None, `check_if_fitted()` will fail, and the
rest of this method will not run.
"""
#X = X.unsqueeze(MCMC_DIM) #This is the first DIFF - the unsqueeze is REMOVED here
self._check_if_fitted()
mean_x = self.mean_module(X)
covar_x = self.covar_module(X)
return MultivariateNormal(mean_x, covar_x)
# pyre-ignore[14]: Inconsistent override
def posterior(
self,
X: Tensor,
output_indices: Optional[List[int]] = None,
observation_noise: bool = False,
posterior_transform: Optional[PosteriorTransform] = None,
**kwargs: Any,
) -> FullyBayesianPosterior:
r"""Computes the posterior over model outputs at the provided points.
Args:
X: A `(batch_shape) x q x d`-dim Tensor, where `d` is the dimension
of the feature space and `q` is the number of points considered
jointly.
output_indices: A list of indices, corresponding to the outputs over
which to compute the posterior (if the model is multi-output).
Can be used to speed up computation if only a subset of the
model's outputs are required for optimization. If omitted,
computes the posterior over all model outputs.
observation_noise: If True, add the observation noise from the
likelihood to the posterior. If a Tensor, use it directly as the
observation noise (must be of shape `(batch_shape) x q x m`).
posterior_transform: An optional PosteriorTransform.
Returns:
A `FullyBayesianPosterior` object. Includes observation noise if specified.
"""
self._check_if_fitted()
posterior = super().posterior(
X=X.unsqueeze(MCMC_DIM), # ...and this is the second DIFF
output_indices=output_indices,
observation_noise=observation_noise,
posterior_transform=posterior_transform,
**kwargs,
)
posterior = FullyBayesianPosterior(distribution=posterior.distribution)
return posterior
I guess this equates to a regular batched model. The only issue is that num_mcmc_samples is not available when the model is initialized (or re-initialize when MCMC samples are obtained). In my own runs, I have simply hacked the num_mcmc_samples in there.
Yeah, like you said, this is effectively just using a batched model with a FullyBayesianPosterior, which is similar to what we do in the legacy Ax model as well. You can probably achieve the same effect by just loading the covariance & likelihood modules of a trained SaasFullyBayesianSingleTaskGP onto a subclass of a SingleTaskGP that has the posterior modification from your comment and is initialized with duplicated train_X/train_Y. It is a bit messy in either case since we need to expect the added batch dimension after evaluating the acquisition functions and average over that to make sure it works with the rest of the BoTorch stack.
I think what you have is a decent solution.
Ideally we could do this in a way that doesn't require changing the forward/posterior methods. I haven't dug into this in detail, but is the issue here fundamentally that the underlying gpytorch model currently doesn't actually have training data of the right size so that the conditioning logic there fails? I wonder if it would be possible to expand that functionality on the gpytorch end to support this.
That is the way that I have understood things. The FBGP doesn't have batch dimension in the inputs, since there's only one set of training data. The length- and outputscales obviously do, however, which gives a batch dimension in the output.
So, batch_shape = torch.Size([]) due to the training_data. in get_fantasy_strategy in gpytorch/models/exact_prediction_strategies.py, the shapes of the fantasized outputs are inferred from the batch_shape of the training data. The forward pass gives an output which has the batch_shape of the hyperparameters (e.g. torch.Size([16])), whereas the training data simply doesn't have one.
As such, inside get_fantasy_strategy:
# full_inputs[0].shape: (num_training_data + num_fantasy_data) x dim <-- Note, no batch dimension
# full_mean.shape: num_models x (num_training_data + num_fantasy_data) <-- But there is one here
batch_shape = full_inputs[0].shape[:-2] # Tries to get the batch dim, but there is none.
full_mean = full_mean.view(*batch_shape, -1) # full_mean has a batch dim but batch_shape is empty, so there's aren't enough args to view(...)
RuntimeError: view size is not compatible with input tensor's size and stride...
So, I tried to address it by adding a batch_shape to the training data, but I definitely don't have the full picture...
The FBGP doesn't have batch dimension in the inputs, since there's only one set of training data. The length- and outputscales obviously do, however, which gives a batch dimension in the output.
So one possibility here would be to add a dedicated batch_shape (or similar) property to the GPyTorch model. Then the default implementation would just grab that from the batch shape of the training data, but more exotic model such as the Fully Bayesian ones could just overwrite that property and ideally things would just work.
@dme65, @saitcakmak does this sound reasonable?
Yeah, that sounds good. We already have a batch_shape property on BoTorch models. It is just a matter of upstreaming this to GP / ExactGP and using it in the code.
Started a PR for this in https://github.com/cornellius-gp/gpytorch/pull/2307
@hvarfner can you check if this works for this use case? I haven't looked super closely through the prediction strategy, but as the full_inputs are now using the batch_shape as defined by the model this could work out of the box.
Not quite - the same issue still persists!
It seems like the forward pass through the GP
https://github.com/cornellius-gp/gpytorch/blob/batch_shape/gpytorch/models/exact_gp.py#L229
outputs a batch_shape x batch_shape x num_training_data which causes the aforementioned view(...) to still fail.
The batch_shape attribute in get_fantasy_strategy is N, (which it should be), but the full_output is N x N (should be N).
Once again, only my understanding of things =)
@hvarfner Is this fixed?
Yes!
Awesome! Do you know what PR fixed it?
#2151