KernelFunctions.jl icon indicating copy to clipboard operation
KernelFunctions.jl copied to clipboard

Specifying Domain of Kernels

Open willtebbutt opened this issue 4 years ago • 13 comments

Proposal 1

# Euclidean domain with D dimensions.
struct Euclidean
    D::Int
end

dim(domain::Euclidean) = domain.D

"""
    DomainKernel{Tkernel<:Kernel, Tdomain}

A kernel with additional information specifying the domain of the kernel.
This provides a complete specification of a kernel, whereas most kernels by themselves do
not provide the domain information necessary to construct e.g. Random Fourier Feature
approximations to themselves.
"""
struct DomainKernel{Tkernel<:Kernel, Tdomain} <: Kernel
    kernel::Tkernel
    domain::Tdomain
end

kernelmatrix(k::DomainKernel, x::AbstractVector) = kernelmatrix(k.kernel, x)

function kernelmatrix(k::DomainKernel, x::AbstractVector, x′::AbstractVector)
    return kernelmatrix(k.kernel, x, x′)
end

Proposal 2

struct AutoEuclidean end

struct Euclidean
    D::Int
end

struct SEKernel{Tmetric_space} <: Kernel
    metric_space::Tmetric_space
end

SEKernel() = SEKernel(AutoEuclidean())

This version would be quite breaking, but I'd rather we didn't worry about that, and instead focus on which one we think constitutes the better solution, and go from there.

I'm personally leaning more towards the second option -- it feel cleaner on some level. The DomainKernel idea would probably work, but my gut is that it'll be messy in practice. You would have to write

DomainKernel(SEKernel(), Euclidean(5)) + 0.5 * with_lengthscale(DomainKernel(SEKernel(), Euclidean(5)))

vs

SEKernel(Euclidean(5)) + 0.5 * with_lengthscale(SEKernel(Euclidean(5)))

My feeling is that the latter is easier to read.

It also avoid some redundancy -- we currently let users specify the metric in a lot of kernels. Having the metric and domain of the kernel specified in different places feels strange to me.

willtebbutt avatar Oct 14 '21 10:10 willtebbutt

Thanks for opening the issue! I would definitely lean on the first option! Actually your example proves the opposite for me as I would write the first option

DomainKernel(SEKernel() + 0.5 * with_lengthscale(SEKernel(), 2.0), Euclidean(5))

which is much clearer. Or alternatively one could also have a with_domain in the same spirit as with_lengthscale

Having the metric and domain in the same place is a valid argument, but can also be slightly confusing. Especially with such close names ;)

theogf avatar Oct 14 '21 10:10 theogf

Actually your example proves the opposite for me as I would write the first option

~~So this actually isn't something that's safe to do in all cases. Consider~~

SEKernel()∘ SelectTransform((1, 3))

~~If you put the DomainKernel on the outside, what dimensionality does it correspond to? Presumably you'd have to write DomainKernel(k, Euclidean(2)), but if you do that then I don't believe you have enough information to build the RFF approximation for the underlying SEKernel because you don't know its dimensionality.~~

~~You can of course concoct arbitrarily complicated examples.~~

~~I could see, however, a with_domain function being helpful in a lot of cases. Taking your example:~~

with_domain(SEKernel() + 0.5 * with_lengthscale(SEKernel(), 2.0), Euclidean(5))

~~would produce something like~~

SEKernel(Euclidean(5)) + 0.5 * with_lengthscale(SEKernel(Euclidean(5)))

~~i.e. it would traverse the kernel and apply the Euclidean(5) domain everywhere under the assumption that each of the kernels at the bottom of the DAG can be equipped with the same domain. It would break if for some reason this wasn't possible to enforce.~~

~~An example where this heuristic would fail might be something like~~

with_domain(SEKernel() + SEKernel()∘ SelectTransform((1, 3)), Euclidean(3))

~~as it would produce~~

SEKernel(Euclidean(3)) + SEKernel(Euclidean(3))∘ SelectTransform((1, 3))

Having the metric and domain in the same place is a valid argument, but can also be slightly confusing. Especially with such close names ;)

Do you think so? To my mind, specifying the metric and the space as two separate things makes sense -- you have a set, and separately you have a notion of how to measure the distance between any two points in said set.

The metric isn't required in all cases (any kernel that isn't build on top of a metric space -- e.g. a kernel defined on a finite-dimensional domain via a positive-definite matrix -- you still have a notion of the domain, but there's no need for a metric), but I think the space always has the potential to be needed.

edit: remove things that are incorrect.

willtebbutt avatar Oct 14 '21 11:10 willtebbutt

To be fair, in my previous example, I put it as one field. Upon reflection, I think it would be better in the SEKernel case to define it as

struct SEKernel{Tdomain, Tmetric} <: Kernel
    domain::Tdomain
    metric::Tmetric
end

SEKernel() = SEKernel(AutoEuclideanSpace(), EuclideanDistance())
# insert whatever other default constructors you like

to make it really explicit that these are two separate things.

edit: Or maybe you do want to bung everything into a single MetricSpace object, and just define a domain function that kernels must implement if they want to support things that need to know the domain? i.e. abstract away this particular implementation detail and just have the API say "give me your domain". Then it wouldn't matter so much.

edit2: I guess my point is that, upon additional reflection, whether SEKernel has one or two fields is an implementation detail that we ought not be overly concerned with at this point in time.

willtebbutt avatar Oct 14 '21 11:10 willtebbutt

If you put the DomainKernel on the outside, what dimensionality does it correspond to?

It corresponds to the general type of input passed to the final kernel. If you indeed have

k = SEKernel(Euclidean(5)) + SEKernel(Euclidean(2))∘ SelectTransform((1, 3))

what should be dim(k) return?

No matter how complex the kernel we create is, the intermediate domains do not necessarily matter right?

theogf avatar Oct 14 '21 11:10 theogf

Taking one step back, what is the main motivation for a DomainKernel, regardless of the (dis)advantages of both approaches? I think it is quite nice that currently one does not have to fix a domain a priori but one can construct a domain-agnostic kernel if it can be applied to the desired inputs. I saw your note about random Fourier features but in this case wouldn't it be cleaner to just demand that one specifies a domain when the features are constructed or computed? I assume also that default domains leads to more verbose constructors or inconsistencies (and the AutoEuclidean seems like a workaround but not a completely appealing design at first glance).

devmotion avatar Oct 14 '21 11:10 devmotion

what should be dim(k) return?

Well this would be an error on two counts -- SEKernel(Euclidean(2))∘ SelectTransform((1, 3)) would error because the SelectTransform asks for the third dimension, but SEKernel(Euclidean(2)) only has two. Secondly, the dimension of SEKernel(Euclidean(3))∘ SelectTransform((1, 3)) would be 2, but SEKernel(Euclidean(5)) is five.

No matter how complex the kernel we create is, the intermediate domains do not necessarily matter right?

Yes and no. I agree that the domain of the kernel that pops out at the end is typically what we care about, but the domain of the "leaf" kernels (those who have domains specified explicitly in their type) will determine the domain of the final kernel, so we would need to be consistent with that or presumably something will break.

Taking one step back, what is the main motivation for a DomainKernel, regardless of the (dis)advantages of both approaches?

As you mention, things like RFF.

I think it is quite nice that currently one does not have to fix a domain a priori but one can construct a domain-agnostic kernel if it can be applied to the desired inputs.

Couldn't agree more, this is why I'm keen on things like AutoEuclideanSpace which say "I'm a euclidean space of whatever dimensionality you like. If you apply me to data, I'll look at the dimension of that data. If you ask for my dimension without providing data, I'll give you an informative error message telling you to build a kernel with more information".

I saw your note about random Fourier features but in this case wouldn't it be cleaner to just demand that one specifies a domain when the features are constructed or computed?

I'm really not sure that it would -- I think it would lead us to having to specify really complicated domains if the user is using a really complicated kernel / I'm not even sure how you would specify it for something like

k = SEKernel() + Matern32Kernel()∘ SelectTransform((1, 3))

Certainly domain(k) should be Euclidean(2), however, it's unclear what the domain of the Matern32Kernel should be, which is going to be needed (I believe) in order to construct an RFF approximation to the whole thing.

I could be missing a trick though.

(and the AutoEuclidean seems like a workaround but not a completely appealing design at first glance).

Which aspects of AutoEuclidean seem unsatisfactory?

willtebbutt avatar Oct 14 '21 11:10 willtebbutt

Well this would be an error on two counts -- SEKernel(Euclidean(2))∘ SelectTransform((1, 3)) would error because the SelectTransform asks for the third dimension, but SEKernel(Euclidean(2)) only has two. Secondly, the dimension of SEKernel(Euclidean(3))∘ SelectTransform((1, 3)) would be 2, but SEKernel(Euclidean(5)) is five.

But wait SelectTransfrom((1, 3)) would take a N dimensional vector x and only return x[[1,3]], a 2-dimensional vector right? So the input to SEKernel would be Euclidean(2) ?

But more generally to be consistent the actual domain of SEKernel ∘ SelectTransform((1, 3)) can still be Euclidean(5) because the "projection" in 2D is still a valid operation in the 5D space

theogf avatar Oct 14 '21 11:10 theogf

But wait SelectTransfrom((1, 3)) would take a N dimensional vector x and only return x[[1,3]], a 2-dimensional vector right? So the input to SEKernel would be Euclidean(2) ?

Sorry, you're totally right. Neither of my points make sense. And you make a good point -- it's unclear what the dimensionality of the domain of something like

Matern32Kernel()∘ SelectTransform((1, 3))

is. We just know that it's at least 3. You're correct that

Matern32Kernel(EuclideanSpace(2))∘ SelectTransform((1, 3))

quite clearly wouldn't resolve this ambiguity, so my point about propagating everything down to the bottom of the tree must be incorrect.

I'm going to continue to hunt for an example where specifying the domain at the top is insufficient.

willtebbutt avatar Oct 14 '21 12:10 willtebbutt

The only way I can imagine that it would not be possible to derive a kernel based on the final domain would be something like

k1 = SEKernel()
k2 = Matern32Kernel()
k3 = cartesian_sum(k1, k2)

but then this wouldn't even be a kernel that you could evaluate for a domain with dimension > 2, since it would be ambiguous which kernel should be applied to which dimension.

I think also that providing information at the end would make pathwise sampling in GPPPs doable, albeit there's a bit of redundancy. For example, consider the process view of the above kernel:

f = @gppp let
    f1 = GP(with_domain(SEKernel(), Euclidean(2)))
    f2 = f1 ∘ select((1, 3))
    f3 = GP(with_domain(SEKernel(), Euclidean(3)))
    f4 = f2 + f3
end

by assumption we know the domain of f1 and f3 because it's been specified explicitly.

Interestingly, I don't think that one needs to know the dimension of the domain of f2 in order to (approximately) construct a sample path from it (e.g. using RFF) -- you just construct a sample path from f1 and compose that with select((1, 3)). Similarly for f4 you don't actually care about its domain, you just require that sample paths for f2 and f3 operate properly on whatever data is provided.

I wonder whether this means that we actually don't care what the domain of Matern32Kernel()∘ SelectTransform((1, 3)) is, for the sake of RFF, since we would just construct an RFF approximation to the Matern32Kernel() anyway, and apply the SelectTransform((1, 3)) to the input data like usual. So maybe my original point was fine after all?

Maybe there are (at least) two ways of going about this after all...

edit: @devmotion I think my GPPP example precludes (to my mind) the idea of just providing the domain when you want when constructing the RFF to f / generating a sample path from f, because the domain of f is a strange looking beast that you certainly wouldn't want a user to have to specify -- it would involve explicitly stating what the domain is for each of the component processes (f1, ..., f4).

willtebbutt avatar Oct 14 '21 12:10 willtebbutt

I'm really not sure that it would -- I think it would lead us to having to specify really complicated domains if the user is using a really complicated kernel / I'm not even sure how you would specify it for something like

I'm a bit confused. To me it feels like my suggestion (only specify the domain when constructing the RFFs) should not be more or less problematic than the DomainKernel(kernel, domain) approach - in both cases you would just have to specify the domain of the final inputs but not the domain of the intermediate constructs, with the difference that the DomainKernel couples both a kernel and a domain in one struct and otherwise it would just be two uncoupled arguments to a function that computes the RFFs. My gut feeling is that the DomainKernel should only be added if there are multiple applications for it and it would useful to be able to couple the domain and the kernel more clearly, e.g., to pass them around together - if the main goal are RFFs right now, it seems simpler to not introduce it initially and reevaluate later if there are more use cases.

Which aspects of AutoEuclidean seem unsatisfactory?

Mainly that it is not a proper domain but a union of different domains. This seems to indicate that it is not generally useful to specify the domain of a kernel and that it would be impractical if one is forced to specify a proper domain. Which makes me wonder if one should specify domains not always as part of kernels (as in the second approach) but only if they are used in some application and defined concretely (no abstract domains such as AutoEuclidean) (as in the first approach or my suggestion when computing RFFs).

My feeling is also that it is easy to end up with inconsistent domains if one has to specify domains of intermediate kernels. And that it seems challenging and annoying (from a developer perspective) to avoid such inconsistencies at construction time.

devmotion avatar Oct 14 '21 20:10 devmotion

Regarding the issue in #14, and the discussion here, my opinion would be that all this stuff should lie in an extra package linked to KernelFunctions. This would basically contain the DomainKernel wrapper as well as the spectral_mixture definitions for a large list of kernels.

theogf avatar Nov 29 '21 22:11 theogf

The only way I can imagine that it would not be possible to derive a kernel based on the final domain would be something like

k1 = SEKernel()
k2 = Matern32Kernel()
k3 = cartesian_sum(k1, k2)

but then this wouldn't even be a kernel that you could evaluate for a domain with dimension > 2, since it would be ambiguous which kernel should be applied to which dimension.

I think also that providing information at the end would make pathwise sampling in GPPPs doable, albeit there's a bit of redundancy. For example, consider the process view of the above kernel:

f = @gppp let
    f1 = GP(with_domain(SEKernel(), Euclidean(2)))
    f2 = f1 ∘ select((1, 3))
    f3 = GP(with_domain(SEKernel(), Euclidean(3)))
    f4 = f2 + f3
end

by assumption we know the domain of f1 and f3 because it's been specified explicitly.

Interestingly, I don't think that one needs to know the dimension of the domain of f2 in order to (approximately) construct a sample path from it (e.g. using RFF) -- you just construct a sample path from f1 and compose that with select((1, 3)). Similarly for f4 you don't actually care about its domain, you just require that sample paths for f2 and f3 operate properly on whatever data is provided.

I wonder whether this means that we actually don't care what the domain of Matern32Kernel()∘ SelectTransform((1, 3)) is, for the sake of RFF, since we would just construct an RFF approximation to the Matern32Kernel() anyway, and apply the SelectTransform((1, 3)) to the input data like usual. So maybe my original point was fine after all?

Maybe there are (at least) two ways of going about this after all...

edit: @devmotion I think my GPPP example precludes (to my mind) the idea of just providing the domain when you want when constructing the RFF to f / generating a sample path from f, because the domain of f is a strange looking beast that you certainly wouldn't want a user to have to specify -- it would involve explicitly stating what the domain is for each of the component processes (f1, ..., f4).

The big story here seems to be that for "composite" kernels built using algebraic operations applied to other kernels, it should not be necessary to specify the domain.

Instead, the domain should be something that is specified for "primitive" kernels.

In particular, I agree that a good way to think about this is to think about the underlying Gaussian processes. For instance, if we write f_1(x) + f_2(T(x)) we need not have f_1 and f_2 be defined on the same domain, they can be defined on different domains X_1 and X_2 as long as T : X_1 -> X_2 maps points into the right spaces.

On the other hand, I think it doesn't make sense to require the user to specify domains of intermediate types that arise in the context of transformed kernels when the domain can be inferred from the transformation - this seems to mostly give rise to boilerplate.

aterenin avatar Dec 02 '21 21:12 aterenin

I still think a natural and simple initial step would be to specify domains separate from kernels as additional arguments if needed (e.g. for RFFs). This would be completely non-breaking and sufficient for applications that require information about the domain.

As mentioned above, if there are multiple applications (not only in theory but in practice in some implementations) with this pattern or we want to pass domains and kernels around in a more closely coupled way, then in my opinion the natural next step would be to add a DomainKernel (proposal 1 in the OP):

  • Then one would not have to deal with domains in every kernel which seems to make the interface for developers more complicated even though it is not needed for many applications (and would require breaking changes but I think this is only a minor argument)
  • One would also not have to deal with domains of intermediate kernels (or rely on unspecific default definitions such as AutoEuclidean) since it would only be necessary to specify the domain of the resulting kernel.
  • One could still introduce arithmetic of DomainKernels to allow to specify domains of intermediate (or even primitive) kernels if desired.

devmotion avatar Dec 02 '21 22:12 devmotion