gpytorch
gpytorch copied to clipboard
Scalable SM Kernel [Feature Request]
🚀 Feature Request
This is somewhere between a bug and feature request. Attempting to use spectral mixture kernels on reasonably sized data (100 x 100 grids) but it OOMs at test time. It is possibly caching related.
Feature request would probably be KeOps implementation for SM kernels or a more faithful implementation of Kronecker based inference for GPs .
Motivation
There is an explicit matrix being formed in the kernel's forwards pass here which is where memory issue is occurring.
Pitch
I (or @g-benton) am willing to open a PR but it may take a while - we just want a comparison to #872 .
Minimal Working Example
import torch
import gpytorch
import math
if torch.cuda.is_available():
torch.set_default_tensor_type(torch.cuda.FloatTensor)
# creat training grid
grid_bounds = [(0, 1), (0, 1)]
grid_size = 70
grid = torch.zeros(grid_size, len(grid_bounds))
for i in range(len(grid_bounds)):
grid_diff = float(grid_bounds[i][1] - grid_bounds[i][0]) / (grid_size - 2)
grid[:, i] = torch.linspace(grid_bounds[i][0] - grid_diff, grid_bounds[i][1] + grid_diff, grid_size)
train_x = gpytorch.utils.grid.create_data_from_grid(grid)
train_y = torch.sin((train_x[:, 0] + train_x[:, 1]) * (2 * math.pi)) + torch.randn_like(train_x[:, 0]).mul(0.01)
# setup model
class GridSM(gpytorch.models.ExactGP):
def __init__(self, grid, train_x, train_y, likelihood):
super(GridSM, self).__init__(train_x, train_y, likelihood)
num_dims = train_x.size(-1)
self.mean_module = gpytorch.means.ConstantMean()
self.base = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=20, ard_num_dims=2)
self.base.initialize_from_data(train_x, train_y)
self.covar_module = gpytorch.kernels.GridKernel(self.base, grid=grid)
#self.covar_module = self.base
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = GridSM(grid, train_x, train_y, likelihood)
test_x = model(train_x)
# setup testing grid
grid_bounds = [(1, 2), (0, 1)]
grid_size = 50
test_grid = torch.zeros(grid_size, len(grid_bounds))
for i in range(len(grid_bounds)):
grid_diff = float(grid_bounds[i][1] - grid_bounds[i][0]) / (grid_size - 2)
test_grid[:, i] = torch.linspace(grid_bounds[i][0] - grid_diff, grid_bounds[i][1] + grid_diff, grid_size)
test_x = gpytorch.utils.grid.create_data_from_grid(test_grid)
# now evaluate
model.eval()
with gpytorch.settings.fast_pred_var(True), gpytorch.settings.skip_posterior_variances(True):
predictive_dist = model(test_x)
Honestly, a KeOps implementation would probably be the way to go.
Is Spectral Mixture Kernel more memory consuming? It seems that I will incur an out-of-memory warning with about 1000 data points and 54 features, where a RBF kernel works just fine.
Can you explain in a bit more detail?
It doesn't look like the proximal issue of an explicit matrix being formed in the forwards pass has been resolved (still here. However, the above code now runs on my GPU because of the improvements made to KroneckerProductLazyTensor over the past year or so (I should have probably closed the issue before).
Has a PR already been opened? If help is needed, I could help out.