gpytorch
gpytorch copied to clipboard
Issues writing convolutional kernel
Hi, I realize this question might be a bit basic, but I'm trying to implement a convolutional kernel as described here on page 5, largely along the lines of the GPFlow implementation.
I've got the patch extraction working in PyTorch, but am running into an issue with the RBFKernel.
I have x1
and x2
with dimensions N x P x patch_len
where N
is the batch size, P
is the number of patches (basically a second batch dimension), and patch_len
is the length of an individual patch.
I need a covariance matrix of dimension N x P x N x P
as output from RBFKernel, but I haven't managed to get this behavior working.
I've tried passing multiple batch dimensions, using a MultitaskKernel, etc, but nothing has worked. I can just reimplement the following GPFlow function in PyTorch
def square_distance(X, X2):
"""
Returns ||X - X2ᵀ||²
Due to the implementation and floating-point imprecision, the
result may actually be very slightly negative for entries very
close to each other.
This function can deal with leading dimensions in X and X2.
In the sample case, where X and X2 are both 2 dimensional,
for example, X is [N, D] and X2 is [M, D], then a tensor of shape
[N, M] is returned. If X is [N1, S1, D] and X2 is [N2, S2, D]
then the output will be [N1, S1, N2, S2].
"""
if X2 is None:
Xs = tf.reduce_sum(tf.square(X), axis=-1, keepdims=True)
dist = -2 * tf.matmul(X, X, transpose_b=True)
dist += Xs + tf.linalg.adjoint(Xs)
return dist
Xs = tf.reduce_sum(tf.square(X), axis=-1)
X2s = tf.reduce_sum(tf.square(X2), axis=-1)
dist = -2 * tf.tensordot(X, X2, [[-1], [-1]])
dist += broadcasting_elementwise(tf.add, Xs, X2s)
return dist
But I would rather use the existing RBFKernel.
Is there a way to use the existing RBFKernel, or should I write this function myself?
@tmuntianu Just to confirm, what you basically want is a matrix D
so that D[i, j, k, p]
gives sq_dist(x1[i, j, :], x2[k, p, :])
right? Or I guess alternatively, the full kernel matrix so that K[i, j, k, p] = exp(-D[i, j, k, p]/\sigma)
?
If so, and you indeed want N
and P
to both be separate batch dimensions, RBFKernel
can indeed provide this behavior, although the interpretation of the output is a little strange (e.g., you're going to have an N x P x N x P
set of 1 x 1
kernel matrices):
# suppose N = 2, P = 3, patch_len = 5
# Idea: use singleton batch dimensions wherever we want broadcasting.
kern = RBFKernel(batch_shape=torch.Size([2, 3, 2, 3]))
x1 = torch.randn(2, 3, 1, 1, 1, 5) # N x P x 1 x 1 x num_data x patch_len (num_data = 1 here)
x2 = torch.randn(1, 1, 2, 3, 1, 5) # 1 x 1 x N x P x num_data x patch_len
output = kern(x1, x2) # output will by N x P x N x P x 1 x 1
Is this what you are looking for?
Yup, exactly what I was looking for! Thanks so much! Didn't realize you could broadcast over batch dimensions like that. I only have one quick follow-up question: is it possible to easily strip away the extra dimensions with LazyTensor API? or should I call .evaluate() and operate on that instead?
Stripping away batch dimensions for LTs is pretty easy -- standard lt[0, :]
etc calls should work pretty easily as should, for example, lt.squeeze(...)
calls. In general, the LT interface tries to mimic the standard torch.tensor
one as closely as possible
I asked because I was getting some errors with expected sizes for the LazyTensors, but I just fixed it by subclassing LazyEvaluatedKernelTensor
and defining a custom _size
method. I just couldn't get it to work by just defining a new num_outputs_per_input
.
Thanks again for all your help! I really appreciate it.
@tmuntianu did you make the convolutional kernel work? I want to use one and don't want to have to move to GPflow...