gpytorch
gpytorch copied to clipboard
[Bug] Inconsistent marginal distribution shapes from likelihood calls for gaussian VS non-gaussian
🐛 Bug
When a likelihood call receives a MultivariateNormal to return a marginal distribution (see here) the default marginal implementation calls the _draw_likelihood_samples method
. This is directly used in non-gaussian likelihoods such as the LaplaceLikelihood. Gaussian likelihoods override the marginal method to return a distribution that does not rely on samples. The result is that for example the LaplaceLikelihood and the Gaussian likelihood return marginal distributions of different shapes.
To reproduce
** Code snippet to reproduce **
import torch
from gpytorch.models import ApproximateGP
from gpytorch.variational import CholeskyVariationalDistribution, VariationalStrategy
from gpytorch.means import ConstantMean
from gpytorch.kernels import ScaleKernel, RBFKernel
from gpytorch.distributions import MultivariateNormal
from gpytorch.likelihoods import GaussianLikelihood, LaplaceLikelihood
class GP(ApproximateGP):
def __init__(self, train_x):
inducing_points = torch.unique(train_x, dim=0)
variational_distribution = CholeskyVariationalDistribution(
num_inducing_points=inducing_points.size(0)
)
variational_strategy = VariationalStrategy(
self, inducing_points, variational_distribution, learn_inducing_locations=False
)
super(GP, self).__init__(variational_strategy)
self.mean_module = ConstantMean()
self.covar_module = ScaleKernel(RBFKernel())
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
latent_pred = MultivariateNormal(mean_x, covar_x)
return latent_pred
train_x = torch.tensor([1., 2., 3., 4.])
model = GP(train_x)
laplace_likelihood = LaplaceLikelihood()
gaussian_likelihood = GaussianLikelihood()
laplace_likelihood(model(train_x)) # <--- Laplace(loc: torch.Size([10, 4]), scale: torch.Size([10, 4]))
gaussian_likelihood(model(train_x)) # <--- MultivariateNormal(loc: torch.Size([4]))
Expected Behavior
It is unexpected behaviour that non-gaussian likelihoods such as the LaplaceLikelihood return marginal distributions where the shape is dependent on the number of likelihood samples that were drawn, and that this is not the case for the Gaussian likelihood.
One issue is that the suggested way of making model predictions likelihood(model(train_x))
returns a tensor of a different shape than what train_y
will typically look like. Not sure if this error propagates to any of the MLLs.
I would expect that all likelihood calls return a distribution of similar shape.
System information
Please complete the following information:
- GPyTorch version 1.6.0
- PyTorch version 1.10.2
- macOS Monterey 12.1
Additional context
I found this through experimenting with implementing other likelihoods myself, notably poisson and negative binomial likelihoods. I am now not sure how to properly get marginal distributions for predicting values.
This is a fair point, though I'm not sure exactly what the best desired behavior should be. With a Gaussian likelihood, you can compute a marginal distribution in closed form. With any other likelihood you (usually) cannot and so we have to sample. We don't really want to replicate this behavior for the Gaussian likelihood however, because this would be artificially introducing variance.
I'm curious about your thoughts on the best approach here.
I think my main confusion is that the API now does not clearly distinguish between closed form vs sampling. In addition, the underlying code execution is a bit opaque. Both could probably be solved with some documentation, although I'll gladly think along about API changes as well.
A slightly more elaborate example: If one wants to sample from the marginal distribution, the non-closed shape is a bit confusing.
Closed form Distribution --> Distribution with n_parameters ~ O(1)
:
model = GP()
likelihood = GaussianLikelihood()
latent_pred = model(test_x) # MultivariateNormal of size test_x
y_pred = likelihood(latent_pred) # MultivariateNormal of size test_x
samples = y_pred.sample([n_samples]) # torch.tensor of size n_samples x test_x
Sampling Distribution --> Samples --> Distribution with n_parameters ~ O(n_samples)
:
model = VariationalGP()
likelihood = LaplaceLikelihood()
latent_pred = model(test_x) # MultivariateNormal of size test_x
y_pred = likelihood(latent_pred) # Laplace of size n_likelihood_samples x test_x
samples = y_pred.sample(n_samples) # torch.tensor of size n_samples x n_likelihood_sample x test_x
Although thinking about it a bit more, it is nice that it is explicit from the non-closed form what kind of sampling is going on (i.e. every y sample can be traced to an individual likelihood sample).
I'm guessing all the non-closed form needs at the end is something like
samples.reshape([n_samples x n_likelihood_samples, test_x.size(0)])
If this is indeed the only needed change, I'll gladly attempt a PR with some documentation.
How about, as a first step, we add more documentation. I would really appreciate a PR from you if you have the bandwidth!