gpytorch Nonstationary kernels

What's the best approach to creating a kernel that locally adapts to varying smoothness in the data? I'm specifically looking at this paper:

Plagemann 2008 Nonstationary Gaussian Process Regression using Point Estimates of Local Smoothness: http://ais.informatik.uni-freiburg.de/publications/papers/plagemann08ecml.pdf

The idea is to use essentially a K-means type clustering algorithm on the training points to find 'inducing' type of locations and a local GP is fit to each of those areas with a parent latent GP governing the local length-scales.

Is this something that would be best suited as a kernel wrapper class like the ScaleKernel or InducingPointKernel classes? All of the GPs need to be optimized together. Any guidance would be appreciated.

Jan 15 '20 21:01 stanbiryukov

So if I understand that paper correctly, they basically derive a kernel (Equation 4) that allows you to define arbitrary functions over lengthscales and then when you compute k(x_i, x_j) you also compute \ell(x_i) and \ell(x_j).

How set are you on the "use a GP prior over \ell" plan / how little data do you have? Implementing \ell(x) using like a two layer neural network (plus a ReLU at the end to make the output positive) would be really trivial, since you could just register the neural network in this hypothetical NonstationaryKernel and then when forward hands you x1 and x2 you pass both through the NN to get \ell_i and \ell_j from their equation 4. After that, training through the exact MLL should do the right thing. Maybe.

Using a GP won't be that much harder, but you'll probably have to define the loss in Equation 5 by hand rather than using our ExactMarginalLogLikelihood.

Jan 15 '20 21:01 jacobrgardner

Yes, was worried it would require a different loss. Your first option sounds reasonable. In practice I wonder how much the GP prior over all would help or potentially make gradient optimization more difficult.

Right now I'm just working with a small dataset of about 50 rows that have 'bumpy' patterns hence the need for multiple GPs to best explain the field. With the neural network approach it should scale reasonably for ExactGP. Thought on how that structure would look?

Jan 15 '20 22:01 stanbiryukov

@stanbiryukov Yeah, so basically what I'd do is define an new kernel:

class NonstationaryKernel(Kernel):
    def __init__(self, neural_net):
        # Register base modules, call super init etc

    def forward(self, x1, x2, diag=False, **params):
        ls1 = self.neural_net(x1)   # Assume NN maps n x d -> n x 1, or b x n x d -> b x n x 1 for batch.
        ls2 = self.neural_net(x2)   # Same here
        
        # Compute rest of the kernel from the paper where ls1[i] corresponds to \ell_i, e.g. can do x1.div(ls1)
        # And ls2[j] corresponds to \ell_j.
        # This also assumes that your NN has a ReLU at the end.

Does that make sense? In terms of the NN architecture, I'd probably start with something pretty small, like a 2 layer net with smoother (e.g. like tanh) nonlinearities.

Jan 16 '20 16:01 jacobrgardner

Thanks for the direction, @jacobrgardner . This makes sense. I've drafted a more fleshed out example below. If I train with the default exact GP framework in Simple_GP_Regression.ipynb, I mostly get singular covariance matrix errors. However, when the model does run for 50 iterations, model(test_x).mean.cpu() produces something along the lines of The size of tensor a (51) must match the size of tensor b (151) at non-singleton dimension 0 . Anything look immediately wrong?

class NSNet(torch.nn.Sequential):
    def __init__(self, data_dim):
        super(NSNet, self).__init__()
        self.data_dim = data_dim
        self.add_module("linear1", torch.nn.Linear(data_dim, 256, bias=True))
        self.add_module("act1", torch.nn.Tanh())
        self.add_module("linear2", torch.nn.Linear(256, 1, bias=True))
        self.add_module("actout", torch.nn.ReLU())

class NonstationaryKernel(gpytorch.kernels.Kernel):
    def __init__(self, data_dim):
        super(NonstationaryKernel, self).__init__(data_dim)
        # Register base modules, call super init etc
        self.neural_net = NSNet(data_dim=data_dim)

    def forward(self, x1, x2, diag=False, **params):
        ls1 = self.neural_net(x1)   # Assume NN maps n x d -> n x 1, or b x n x d -> b x n x 1 for batch.
        ls2 = self.neural_net(x2)   # Same here
        
        # Compute rest of the kernel from the paper where ls1[i] corresponds to \ell_i, e.g. can do x1.div(ls1)
        # And ls2[j] corresponds to \ell_j.
        # This also assumes that your NN has a ReLU at the end.
        x1_ = x1.div(self.lengthscale)
        x2_ = x2.div(self.lengthscale)
        # Distance matrix
        X1 = x1_.pow(2).sum(dim=1, keepdim=True)
        Z2 = x2_.pow(2).sum(dim=1, keepdim=True)
        XZ = x1_.matmul(x2_.t())
        r2 = X1 - 2 * XZ + Z2.t()
        d_ = torch.sqrt(r2.clamp(min=1e-12))
        # Equation 4
        k_ = torch.exp(-d_.t() * ((ls1.sum(dim=1).add(ls2.sum(dim=1)).div(2)).pow(-1)) * d_)

        n = torch.abs(ls1.sum(dim=1)).pow(.25).mul(torch.abs(ls2.sum(dim=1)).pow(.25))
        d = torch.abs( (ls1.sum(dim=1).add(ls2.sum(dim=1))).div(2)).pow(.5).clamp(min=1e-12)
        rs = n.div(d)
        k = rs.mul(k_)
        # print(rs)
        return k

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        if train_x.dim() > 1:
            # wrap with gpytorch.kernels.ScaleKernel?
            self.covar_module = gpytorch.kernels.ScaleKernel(NonstationaryKernel(data_dim=train_x.shape[1]))
        else:
            self.covar_module = gpytorch.kernels.ScaleKernel(NonstationaryKernel(data_dim=1))
        # self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

Jan 16 '20 22:01 stanbiryukov

Here's a better working example where it appears the parameters are being learned with SGD. However, I'm puzzled by what looks like x2 changing shape in the forward function when I set the model in eval and attempt to predict.

class NSNet(torch.nn.Sequential):
    def __init__(self, data_dim):
        super(NSNet, self).__init__()
        self.data_dim = data_dim
        self.add_module("linear1", torch.nn.Linear(data_dim, 256, bias=True))
        self.add_module("act1", torch.nn.ReLU())
        self.add_module("linear2", torch.nn.Linear(256, 128, bias=True))
        self.add_module("act2", torch.nn.ReLU())
        self.add_module("linearout", torch.nn.Linear(128, 1, bias=True))

class NonstationaryKernel(gpytorch.kernels.Kernel):
    def __init__(self, data_dim, **kwargs):
        super(NonstationaryKernel, self).__init__(data_dim, **kwargs)
        # Register base modules, call super init etc
        self.neural_net = NSNet(data_dim=data_dim)

    def forward(self, x1, x2, diag=False, **params):
        print(x1.shape, x2.shape)
        ls1 = torch.exp(self.neural_net(x1))   # Assume NN maps n x d -> n x 1, or b x n x d -> b x n x 1 for batch.
        ls2 = torch.exp(self.neural_net(x2))   # Same here
        # clone because inplace operations will mess with what's saved for backward
        ls1_ = ls1.clone()
        ls2_ = ls2.clone()
        print(ls1.shape, ls2.shape)
        # Compute rest of the kernel from the paper where ls1[i] corresponds to \ell_i, e.g. can do x1.div(ls1)
        # And ls2[j] corresponds to \ell_j.
        # This also assumes that your NN has a ReLU at the end.
        x1_ = x1.div(self.lengthscale)
        x2_ = x2.div(self.lengthscale)
        # Distance matrix
        X1 = x1_.pow(2).sum(dim=1, keepdim=True)
        X2 = x2_.pow(2).sum(dim=1, keepdim=True)
        XX = x1_.matmul(x2_.t())
        r2 = X1 - 2 * XX + X2.t()
        d = torch.sqrt(r2.clamp(min=1e-12))
        d_ = d.clone()
        # Equation 4
        lx = ((ls1_.sum(dim=1).add(ls2_.sum(dim=1)).div(2)).pow(-1))
        k_ = torch.exp(- d_.t() * lx * d_)
        print (lx.shape, d_.shape)
        n = torch.abs(ls1_.sum(dim=1)).pow(.25).mul(torch.abs(ls2_.sum(dim=1)).pow(.25))
        dn = torch.abs( (ls1_.sum(dim=1).add(ls2_.sum(dim=1))).div(2)).pow(.5).clamp(min=1e-12)
        kout = n.div(dn).mul(k_)
        print(kout.shape)
        return kout
    

train_x = torch.linspace(0, 1, 100).reshape(-1,1)
train_y = torch.sin(train_x * (2 * np.pi)) + torch.randn(train_x.size()) * 0.2
train_y = train_y.reshape(-1,)
test_x = torch.linspace(0, 1, 51).reshape(-1,1)

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(NonstationaryKernel(data_dim=train_x.shape[1]))

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# initialize likelihood and model
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y, likelihood)
# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use SGD
optimizer = torch.optim.SGD([
    {'params': model.parameters()},  # Includes GaussianLikelihood parameters
], lr=.1)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

training_iter = 10
for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(train_x)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (
        i + 1, training_iter, loss.item(),
        model.covar_module.base_kernel.lengthscale.item(),
        model.likelihood.noise.item()
    ))
    optimizer.step()

torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100]) torch.Size([100, 100]) torch.Size([100, 100]) Iter 1/10 - Loss: 0.902 lengthscale: 0.693 noise: 0.693 torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100]) torch.Size([100, 100]) torch.Size([100, 100]) Iter 2/10 - Loss: 0.887 lengthscale: 0.689 noise: 0.679 torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100]) torch.Size([100, 100]) torch.Size([100, 100]) Iter 3/10 - Loss: 0.873 lengthscale: 0.684 noise: 0.665 torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100]) torch.Size([100, 100]) torch.Size([100, 100]) Iter 4/10 - Loss: 0.857 lengthscale: 0.680 noise: 0.651 torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100]) torch.Size([100, 100]) torch.Size([100, 100]) Iter 5/10 - Loss: 0.841 lengthscale: 0.675 noise: 0.636 torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100]) torch.Size([100, 100]) torch.Size([100, 100]) Iter 6/10 - Loss: 0.825 lengthscale: 0.670 noise: 0.622 torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100]) torch.Size([100, 100]) torch.Size([100, 100]) Iter 7/10 - Loss: 0.810 lengthscale: 0.666 noise: 0.607 torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100]) torch.Size([100, 100]) torch.Size([100, 100]) Iter 8/10 - Loss: 0.795 lengthscale: 0.662 noise: 0.592 torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100]) torch.Size([100, 100]) torch.Size([100, 100]) Iter 9/10 - Loss: 0.780 lengthscale: 0.658 noise: 0.578 torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100, 1]) torch.Size([100]) torch.Size([100, 100]) torch.Size([100, 100]) Iter 10/10 - Loss: 0.767 lengthscale: 0.655 noise: 0.564

model.eval()
likelihood.eval()

with torch.no_grad():
    fposterior = model(test_x)

torch.Size([51, 1]) torch.Size([151, 1]) torch.Size([51, 1]) torch.Size([151, 1]) RuntimeError: The size of tensor a (51) must match the size of tensor b (151) at non-singleton dimension 0

Jan 17 '20 22:01 stanbiryukov

Realized I can default to the built-in self.covar_dist instead of building the distance matrix. Nonetheless, @jacobrgardner, I'm not sure what changes in eval mode that makes x2 a non-conforming shape. Any suggestions?

Jan 21 '20 16:01 stanbiryukov

Must be related to this concatenation given my training data is of length 100 and test is 51: https://github.com/cornellius-gp/gpytorch/blob/90489c5b41ec79f646cae26332ecbad2ce1fac5d/gpytorch/models/exact_gp.py#L297

Jan 21 '20 16:01 stanbiryukov

@jacobrgardner It seems that embedding a neural network within a gpytorch.kernels.Kernel is leading to issues with posterior inference. Are there components that I'm missing in the above example or is this a separate issue?

Jan 29 '20 15:01 stanbiryukov

Hey @stanbiryukov

Sorry this took so long to look in to, things are pretty busy right before ICML.

Your issue is just batch shaping stuff that only ends up showing up when we try to compute cross covariances between training data and test data. For example:

lx = ((ls1_.sum(dim=1).add(ls2_.sum(dim=1)).div(2)).pow(-1))

doesn't quite do what you want -- I think what you probably want lx to be is the matrix whose ijth entry is 0.5*\ell_i^2 + 0.5*\ell_j^2. To accomplish this, you can add unsqueezes (although the sum then unsqueeze for ls1_ is redundant):

lx = ((ls1_.sum(dim=1).unsqueeze(-1).add(ls2_.sum(dim=1).unsqueeze(-2)).div(2)).pow(-1))

The way this works is it takes ls1_ to be 51 x 1 and ls2_ to be 1 x 151 so that ls1_ + ls2_ broadcasts to give a 51 x 151 matrix.

I'm also not clear why you want:

 k_ = torch.exp(- d_.t() * lx * d_)

since d_ was already squared distances, then you just took the square root. Probably this should be like k_ = torch.exp(-d_ * lx) or something where you haven't done d_.sqrt()?

Replacing some of your code with the following works for me in the sense that it at least runs through, but I'd recommend double checking the math:

r2 = X1 - 2 * XX + X2.t()
d_ = r2
# Equation 4
lx = ((ls1_.sum(dim=1).unsqueeze(-1).add(ls2_.sum(dim=1).unsqueeze(-2)).div(2)).pow(-1))
k_ = torch.exp(- d_ * lx)
print (lx.shape, d_.shape)
n = torch.abs(ls1_.sum(dim=1).unsqueeze(-1)).pow(.25).mul(torch.abs(ls2_.sum(dim=1).unsqueeze(-2)).pow(.25))
dn = torch.abs( (ls1_.sum(dim=1).unsqueeze(-1).add(ls2_.sum(dim=1).unsqueeze(-2))).div(2)).pow(.5).clamp(min=1e-12)
kout = n.div(dn).mul(k_)

Jan 29 '20 17:01 jacobrgardner

Hi @jacobrgardner totally understand - thanks for the help as always. I'll see if I can make it to ICML this year.

The unsqueeze makes sense so thanks for guiding me through the broadcasting fix. I'll replace distance calc with self.covar_dist and work a bit more on this.

Jan 29 '20 18:01 stanbiryukov

Hi @jacobrgardner - with version 1.0.1 of gpytorch, I'm seeing a lazy evaluation error that I'm having trouble tracing back to the root cause. Any suggestions on how to now properly evaluate the neural network and kernel parameters in the forward definition?

torch.Size([100, 1]) torch.Size([100, 1])
 TypeError: div(): argument 'other' (position 1) must be Tensor, not NoneType 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-21f16b3dd7b7> in <module>
     78     output = model(train_x)
     79     # Calc loss and backprop gradients
---> 80     loss = -mll(output, train_y)
     81     loss.backward()
     82     print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (

/opt/conda/lib/python3.6/site-packages/gpytorch/module.py in __call__(self, *inputs, **kwargs)
     26 
     27     def __call__(self, *inputs, **kwargs):
---> 28         outputs = self.forward(*inputs, **kwargs)
     29         if isinstance(outputs, list):
     30             return [_validate_module_outputs(output) for output in outputs]

/opt/conda/lib/python3.6/site-packages/gpytorch/mlls/exact_marginal_log_likelihood.py in forward(self, function_dist, target, *params)
     49         # Get the log prob of the marginal distribution
     50         output = self.likelihood(function_dist, *params)
---> 51         res = output.log_prob(target)
     52 
     53         # Add additional terms (SGPR / learned inducing points, heteroskedastic likelihood models)

/opt/conda/lib/python3.6/site-packages/gpytorch/distributions/multivariate_normal.py in log_prob(self, value)
    133 
    134         # Get log determininat and first part of quadratic form
--> 135         inv_quad, logdet = covar.inv_quad_logdet(inv_quad_rhs=diff.unsqueeze(-1), logdet=True)
    136 
    137         res = -0.5 * sum([inv_quad, logdet, diff.size(-1) * math.log(2 * math.pi)])

/opt/conda/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in inv_quad_logdet(self, inv_quad_rhs, logdet, reduce_inv_quad)
   1000             from .chol_lazy_tensor import CholLazyTensor
   1001 
-> 1002             cholesky = CholLazyTensor(self.cholesky())
   1003             return cholesky.inv_quad_logdet(inv_quad_rhs=inv_quad_rhs, logdet=logdet, reduce_inv_quad=reduce_inv_quad)
   1004 

/opt/conda/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in cholesky(self, upper)
    737             (LazyTensor) Cholesky factor (lower triangular)
    738         """
--> 739         res = self._cholesky()
    740         if upper:
    741             res = res.transpose(-1, -2)

/opt/conda/lib/python3.6/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs)
     32         cache_name = name if name is not None else method
     33         if not is_in_cache(self, cache_name):
---> 34             add_to_cache(self, cache_name, method(self, *args, **kwargs))
     35         return get_from_cache(self, cache_name)
     36 

/opt/conda/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in _cholesky(self)
    400         from .keops_lazy_tensor import KeOpsLazyTensor
    401 
--> 402         evaluated_kern_mat = self.evaluate_kernel()
    403 
    404         if any(isinstance(sub_mat, KeOpsLazyTensor) for sub_mat in evaluated_kern_mat._args):

/opt/conda/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in evaluate_kernel(self)
    883         all lazily evaluated kernels actually evaluated.
    884         """
--> 885         return self.representation_tree()(*self.representation())
    886 
    887     def inv_matmul(self, right_tensor, left_tensor=None):

/opt/conda/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor.py in representation_tree(self)
   1273         including all subobjects. This is used internally.
   1274         """
-> 1275         return LazyTensorRepresentationTree(self)
   1276 
   1277     @property

/opt/conda/lib/python3.6/site-packages/gpytorch/lazy/lazy_tensor_representation_tree.py in __init__(self, lazy_tsr)
     11         for arg in lazy_tsr._args:
     12             if hasattr(arg, "representation") and callable(arg.representation):  # Is it a lazy tensor?
---> 13                 representation_size = len(arg.representation())
     14                 self.children.append((slice(counter, counter + representation_size, None), arg.representation_tree()))
     15                 counter += representation_size

/opt/conda/lib/python3.6/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py in representation(self)
    311         # representation
    312         else:
--> 313             return self.evaluate_kernel().representation()
    314 
    315     def representation_tree(self):

/opt/conda/lib/python3.6/site-packages/gpytorch/utils/memoize.py in g(self, *args, **kwargs)
     32         cache_name = name if name is not None else method
     33         if not is_in_cache(self, cache_name):
---> 34             add_to_cache(self, cache_name, method(self, *args, **kwargs))
     35         return get_from_cache(self, cache_name)
     36 

/opt/conda/lib/python3.6/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py in evaluate_kernel(self)
    278             temp_active_dims = self.kernel.active_dims
    279             self.kernel.active_dims = None
--> 280             res = self.kernel(x1, x2, diag=False, last_dim_is_batch=self.last_dim_is_batch, **self.params)
    281             self.kernel.active_dims = temp_active_dims
    282 

/opt/conda/lib/python3.6/site-packages/gpytorch/kernels/kernel.py in __call__(self, x1, x2, diag, last_dim_is_batch, **params)
    394                 res = LazyEvaluatedKernelTensor(x1_, x2_, kernel=self, last_dim_is_batch=last_dim_is_batch, **params)
    395             else:
--> 396                 res = lazify(super(Kernel, self).__call__(x1_, x2_, last_dim_is_batch=last_dim_is_batch, **params))
    397             return res
    398 

/opt/conda/lib/python3.6/site-packages/gpytorch/module.py in __call__(self, *inputs, **kwargs)
     26 
     27     def __call__(self, *inputs, **kwargs):
---> 28         outputs = self.forward(*inputs, **kwargs)
     29         if isinstance(outputs, list):
     30             return [_validate_module_outputs(output) for output in outputs]

<ipython-input-13-21f16b3dd7b7> in forward(self, x1, x2, diag, **params)
     20     def forward(self, x1, x2, diag=False, **params):
     21         print(x1.shape, x2.shape)
---> 22         x1_ = x1.div(self.lengthscale)
     23         x2_ = x2.div(self.lengthscale)
     24         ls1 = torch.exp(

TypeError: div(): argument 'other' (position 1) must be Tensor, not NoneType

Apr 14 '20 12:04 stanbiryukov

@stanbiryukov looks like this might have to do with a change we made to how we mark kernels as having lengthscales we didn't think about being breaking.

Try adding this: https://github.com/cornellius-gp/gpytorch/blob/5ea8483114813655c792321b334a1f8e15788d74/gpytorch/kernels/rbf_kernel.py#L70

Apr 15 '20 22:04 jacobrgardner

I was curious if/how those flags were used. Unfortunately, adding that didn't seem to work - received the exact same error.

Apr 15 '20 23:04 stanbiryukov

@stanbiryukov Could you provide an updated sample of code to run that produces the error?

Apr 16 '20 18:04 jacobrgardner

@jacobrgardner Sure - here's a colab notebook with a MWE: https://colab.research.google.com/drive/1ymAidoiNmK73pa5xzG4BMnUAxX9wjZDD

Apr 16 '20 19:04 stanbiryukov

Hi @stanbiryukov!

I am currently working on a problem, where the solution might be a kernel that locally adapts to varying smoothness as you describe here. I was wondering whether you had any succes with your approach based on the Plagemann Paper?

Kind regards Kasper

May 17 '22 12:05 KasperNiklas

Hi @KasperNiklas - in practice, I found that utilizing an anisotropic Matérn kernel, with additional feature engineering, worked quite well for a breadth of problems. Specifically, for time series interpolation and forecasting, I added additional covariates by decomposing timestamps into cyclical continuous features (ie sine / cosine transforms of day, month). This allows the kernel to learn unique lengthscale across more degrees of freedom and provided the adaptive local smoothing I was looking for.

That said, looking at the problem you referenced, a multivariate gaussian structure may not be the best candidate given the sharpness in the data. I recommend modeling with a normalizing flow which can resolve more complex pdfs than GPs. See Neural Spline Flows for an overview. I've found pzflow an excellent library for modeling the joint probability distribution of data that is not as amenable to the gaussian likelihood of GPs.

Jun 03 '22 18:06 stanbiryukov

Thank you @stanbiryukov for taking the time to answer my question! I like your idea with decomposition of the timestamps, I will definitely remember that one for future projects :)

We will look into your suggestion of modeling the problem with a normalising flow

Jun 04 '22 08:06 KasperNiklas

gpytorch gpytorch copied to clipboard

Nonstationary kernels

gpytorch
gpytorch copied to clipboard