botorch
botorch copied to clipboard
[Bug] Gradient of GPDraw does not behave as expect
🐛 Bug
I'm trying to maximize f(x) with respect to x in 2D, where f ~ GP using the GPDraw function (i.e., 1 step in the Thompson sampling procedure). The gradient y = f(x) w.r.t x does not point toward the steepest descent direction. In addition, the gradient can sometimes change direction quite unexpectedly. This behavior is not observed when f is Ackley. I wonder if this is a code problem or a (numerical) issue with GP. Any guidance would be appreciated.
To reproduce
** Code snippet to reproduce **
import torch
import numpy as np
from botorch import fit_gpytorch_model
from botorch.models import SingleTaskGP
from botorch.models.transforms.outcome import Standardize
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch.utils.gp_sampling import GPDraw
from torch.optim import Adam
import matplotlib.pyplot as plt
from botorch.test_functions.synthetic import Ackley
import random
x_dim = 2
seed = 1442223
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
device = "cuda"
torch_dtype = torch.float32
initial_points = [
[0.2, 0.7],
[0.0, -0.4],
[-0.2, 0.8],
[-0.5, 0.5],
[-0.3, 0.0],
[0.8, -0.1],
[-0.4, -0.5],
[0.5, -0.5],
[-0.7, -0.6],
[0.45, 0.5]
]
env = Ackley()
data_x = torch.tensor(
initial_points,
device=device,
dtype=torch_dtype,
)
# >>> n_initial_points x dim
data_y = env(data_x).reshape(-1, 1)
# >>> n_initial_points x 1
from gpytorch.kernels import RBFKernel, ScaleKernel
GP = SingleTaskGP(
data_x,
data_y,
outcome_transform=Standardize(1),
).to(device)
mll = ExactMarginalLogLikelihood(GP.likelihood, GP)
fit_gpytorch_model(mll)
# Initialize x with shape (100, 2)
x = torch.rand(100, 2, device=device)*2 -1
x.requires_grad_(True)
optimizer = Adam([x], lr=0.01)
for i in range(1000):
optimizer.zero_grad()
f = GPDraw(GP, seed=seed)
# f = Ackley()
loss = -f(x).mean()
loss.backward()
optimizer.step()
grad = x.grad.clone()
# Plotting ###############################################################
n_space = 100
fig, ax = plt.subplots(1, 1)
bounds_plot_x = bounds_plot_y = -1.1, 1.1
ax.set(xlabel="$x_1$", ylabel="$x_2$", xlim=bounds_plot_x, ylim=bounds_plot_y)
title = "GPDraw Gradient Test"
ax.set_title(label=title)
# Plot function in 2D ####################################################
X_domain, Y_domain = (-1.1, 1.1), (-1.1, 1.1)
X, Y = np.linspace(*X_domain, n_space), np.linspace(*Y_domain, n_space)
X, Y = np.meshgrid(X, Y)
XY = torch.tensor(np.array([X, Y])).float().to(device)
# >> 2 x 100 x 100
f = GPDraw(GP, seed=seed)
# f = Ackley()
Z = f(XY.reshape(2, -1).T).reshape(X.shape).cpu().detach().numpy()
cs = ax.contourf(X, Y, Z, levels=30, cmap="bwr", alpha=0.7)
ax.set_aspect(aspect="equal")
cbar = fig.colorbar(cs)
cbar.ax.set_ylabel("$f(x)$", rotation=270, labelpad=20)
ax.scatter(
x[1, 0].cpu().detach().numpy(),
x[1, 1].cpu().detach().numpy(),
label="Data",
color='red'
)
ax.arrow(
x[1, 0].cpu().detach().numpy(),
x[1, 1].cpu().detach().numpy(),
grad[1, 0].cpu().detach().numpy(),
grad[1, 1].cpu().detach().numpy(),
head_width=0.05,
head_length=0.1,
fc='blue',
ec='blue'
)
ax.legend()
plt.show()
** Stack trace/error message **
The gradient does not point downhill. It changes the direction and magnitude quite rapidly. The particle does not seem to move with a gradient.
Expected Behavior
The gradient should point downhill. Below is the expected behavior when using the Ackley function. The particle moves uphill nicely with Adam, and the gradient behaves as expected.
System information
Please complete the following information:
- BoTorch Version: 0.9.4
- GPyTorch Version: 1.11
- PyTorch Version: 2.1.0+cu118
- Computer OS: Google Colab
Additional context
None
Thanks for flagging this. We'll have to look into this in a bit more detail. cc @SebastianAment for potential numerical issues with gradient computations.
That said, is there a specific reason you are using the GPDraw class? This is a bit of a poor man's approach to drawing sample paths from GPs, we have a much better setup based on path-wise sampling: https://github.com/pytorch/botorch/blob/main/botorch/sampling/pathwise/posterior_samplers.py#L86-L107
It should generally be preferable to use that - please let us know if you run into any similar issues there.
Hi @sangttruong. We're deprecating the GPDraw class and it will be removed in a future release. I'd also recommend following @Balandat's recommendation to use pathwise sampling instead. Closing this since we do not intend to investigate the issue further due to deprecation.