KernelFunctions.jl
KernelFunctions.jl copied to clipboard
Specifying Domain of Kernels
Proposal 1
# Euclidean domain with D dimensions.
struct Euclidean
D::Int
end
dim(domain::Euclidean) = domain.D
"""
DomainKernel{Tkernel<:Kernel, Tdomain}
A kernel with additional information specifying the domain of the kernel.
This provides a complete specification of a kernel, whereas most kernels by themselves do
not provide the domain information necessary to construct e.g. Random Fourier Feature
approximations to themselves.
"""
struct DomainKernel{Tkernel<:Kernel, Tdomain} <: Kernel
kernel::Tkernel
domain::Tdomain
end
kernelmatrix(k::DomainKernel, x::AbstractVector) = kernelmatrix(k.kernel, x)
function kernelmatrix(k::DomainKernel, x::AbstractVector, x′::AbstractVector)
return kernelmatrix(k.kernel, x, x′)
end
Proposal 2
struct AutoEuclidean end
struct Euclidean
D::Int
end
struct SEKernel{Tmetric_space} <: Kernel
metric_space::Tmetric_space
end
SEKernel() = SEKernel(AutoEuclidean())
This version would be quite breaking, but I'd rather we didn't worry about that, and instead focus on which one we think constitutes the better solution, and go from there.
I'm personally leaning more towards the second option -- it feel cleaner on some level. The DomainKernel idea would probably work, but my gut is that it'll be messy in practice. You would have to write
DomainKernel(SEKernel(), Euclidean(5)) + 0.5 * with_lengthscale(DomainKernel(SEKernel(), Euclidean(5)))
vs
SEKernel(Euclidean(5)) + 0.5 * with_lengthscale(SEKernel(Euclidean(5)))
My feeling is that the latter is easier to read.
It also avoid some redundancy -- we currently let users specify the metric in a lot of kernels. Having the metric and domain of the kernel specified in different places feels strange to me.
Thanks for opening the issue! I would definitely lean on the first option! Actually your example proves the opposite for me as I would write the first option
DomainKernel(SEKernel() + 0.5 * with_lengthscale(SEKernel(), 2.0), Euclidean(5))
which is much clearer.
Or alternatively one could also have a with_domain in the same spirit as with_lengthscale
Having the metric and domain in the same place is a valid argument, but can also be slightly confusing. Especially with such close names ;)
Actually your example proves the opposite for me as I would write the first option
~~So this actually isn't something that's safe to do in all cases. Consider~~
SEKernel()∘ SelectTransform((1, 3))
~~If you put the DomainKernel on the outside, what dimensionality does it correspond to? Presumably you'd have to write DomainKernel(k, Euclidean(2)), but if you do that then I don't believe you have enough information to build the RFF approximation for the underlying SEKernel because you don't know its dimensionality.~~
~~You can of course concoct arbitrarily complicated examples.~~
~~I could see, however, a with_domain function being helpful in a lot of cases. Taking your example:~~
with_domain(SEKernel() + 0.5 * with_lengthscale(SEKernel(), 2.0), Euclidean(5))
~~would produce something like~~
SEKernel(Euclidean(5)) + 0.5 * with_lengthscale(SEKernel(Euclidean(5)))
~~i.e. it would traverse the kernel and apply the Euclidean(5) domain everywhere under the assumption that each of the kernels at the bottom of the DAG can be equipped with the same domain. It would break if for some reason this wasn't possible to enforce.~~
~~An example where this heuristic would fail might be something like~~
with_domain(SEKernel() + SEKernel()∘ SelectTransform((1, 3)), Euclidean(3))
~~as it would produce~~
SEKernel(Euclidean(3)) + SEKernel(Euclidean(3))∘ SelectTransform((1, 3))
Having the metric and domain in the same place is a valid argument, but can also be slightly confusing. Especially with such close names ;)
Do you think so? To my mind, specifying the metric and the space as two separate things makes sense -- you have a set, and separately you have a notion of how to measure the distance between any two points in said set.
The metric isn't required in all cases (any kernel that isn't build on top of a metric space -- e.g. a kernel defined on a finite-dimensional domain via a positive-definite matrix -- you still have a notion of the domain, but there's no need for a metric), but I think the space always has the potential to be needed.
edit: remove things that are incorrect.
To be fair, in my previous example, I put it as one field. Upon reflection, I think it would be better in the SEKernel case to define it as
struct SEKernel{Tdomain, Tmetric} <: Kernel
domain::Tdomain
metric::Tmetric
end
SEKernel() = SEKernel(AutoEuclideanSpace(), EuclideanDistance())
# insert whatever other default constructors you like
to make it really explicit that these are two separate things.
edit: Or maybe you do want to bung everything into a single MetricSpace object, and just define a domain function that kernels must implement if they want to support things that need to know the domain? i.e. abstract away this particular implementation detail and just have the API say "give me your domain". Then it wouldn't matter so much.
edit2: I guess my point is that, upon additional reflection, whether SEKernel has one or two fields is an implementation detail that we ought not be overly concerned with at this point in time.
If you put the DomainKernel on the outside, what dimensionality does it correspond to?
It corresponds to the general type of input passed to the final kernel. If you indeed have
k = SEKernel(Euclidean(5)) + SEKernel(Euclidean(2))∘ SelectTransform((1, 3))
what should be dim(k) return?
No matter how complex the kernel we create is, the intermediate domains do not necessarily matter right?
Taking one step back, what is the main motivation for a DomainKernel, regardless of the (dis)advantages of both approaches? I think it is quite nice that currently one does not have to fix a domain a priori but one can construct a domain-agnostic kernel if it can be applied to the desired inputs. I saw your note about random Fourier features but in this case wouldn't it be cleaner to just demand that one specifies a domain when the features are constructed or computed? I assume also that default domains leads to more verbose constructors or inconsistencies (and the AutoEuclidean seems like a workaround but not a completely appealing design at first glance).
what should be dim(k) return?
Well this would be an error on two counts -- SEKernel(Euclidean(2))∘ SelectTransform((1, 3)) would error because the SelectTransform asks for the third dimension, but SEKernel(Euclidean(2)) only has two. Secondly, the dimension of SEKernel(Euclidean(3))∘ SelectTransform((1, 3)) would be 2, but SEKernel(Euclidean(5)) is five.
No matter how complex the kernel we create is, the intermediate domains do not necessarily matter right?
Yes and no. I agree that the domain of the kernel that pops out at the end is typically what we care about, but the domain of the "leaf" kernels (those who have domains specified explicitly in their type) will determine the domain of the final kernel, so we would need to be consistent with that or presumably something will break.
Taking one step back, what is the main motivation for a DomainKernel, regardless of the (dis)advantages of both approaches?
As you mention, things like RFF.
I think it is quite nice that currently one does not have to fix a domain a priori but one can construct a domain-agnostic kernel if it can be applied to the desired inputs.
Couldn't agree more, this is why I'm keen on things like AutoEuclideanSpace which say "I'm a euclidean space of whatever dimensionality you like. If you apply me to data, I'll look at the dimension of that data. If you ask for my dimension without providing data, I'll give you an informative error message telling you to build a kernel with more information".
I saw your note about random Fourier features but in this case wouldn't it be cleaner to just demand that one specifies a domain when the features are constructed or computed?
I'm really not sure that it would -- I think it would lead us to having to specify really complicated domains if the user is using a really complicated kernel / I'm not even sure how you would specify it for something like
k = SEKernel() + Matern32Kernel()∘ SelectTransform((1, 3))
Certainly domain(k) should be Euclidean(2), however, it's unclear what the domain of the Matern32Kernel should be, which is going to be needed (I believe) in order to construct an RFF approximation to the whole thing.
I could be missing a trick though.
(and the AutoEuclidean seems like a workaround but not a completely appealing design at first glance).
Which aspects of AutoEuclidean seem unsatisfactory?
Well this would be an error on two counts -- SEKernel(Euclidean(2))∘ SelectTransform((1, 3)) would error because the SelectTransform asks for the third dimension, but SEKernel(Euclidean(2)) only has two. Secondly, the dimension of SEKernel(Euclidean(3))∘ SelectTransform((1, 3)) would be 2, but SEKernel(Euclidean(5)) is five.
But wait SelectTransfrom((1, 3)) would take a N dimensional vector x and only return x[[1,3]], a 2-dimensional vector right? So the input to SEKernel would be Euclidean(2) ?
But more generally to be consistent the actual domain of SEKernel ∘ SelectTransform((1, 3)) can still be Euclidean(5) because the "projection" in 2D is still a valid operation in the 5D space
But wait SelectTransfrom((1, 3)) would take a N dimensional vector x and only return x[[1,3]], a 2-dimensional vector right? So the input to SEKernel would be Euclidean(2) ?
Sorry, you're totally right. Neither of my points make sense. And you make a good point -- it's unclear what the dimensionality of the domain of something like
Matern32Kernel()∘ SelectTransform((1, 3))
is. We just know that it's at least 3. You're correct that
Matern32Kernel(EuclideanSpace(2))∘ SelectTransform((1, 3))
quite clearly wouldn't resolve this ambiguity, so my point about propagating everything down to the bottom of the tree must be incorrect.
I'm going to continue to hunt for an example where specifying the domain at the top is insufficient.
The only way I can imagine that it would not be possible to derive a kernel based on the final domain would be something like
k1 = SEKernel()
k2 = Matern32Kernel()
k3 = cartesian_sum(k1, k2)
but then this wouldn't even be a kernel that you could evaluate for a domain with dimension > 2, since it would be ambiguous which kernel should be applied to which dimension.
I think also that providing information at the end would make pathwise sampling in GPPPs doable, albeit there's a bit of redundancy. For example, consider the process view of the above kernel:
f = @gppp let
f1 = GP(with_domain(SEKernel(), Euclidean(2)))
f2 = f1 ∘ select((1, 3))
f3 = GP(with_domain(SEKernel(), Euclidean(3)))
f4 = f2 + f3
end
by assumption we know the domain of f1 and f3 because it's been specified explicitly.
Interestingly, I don't think that one needs to know the dimension of the domain of f2 in order to (approximately) construct a sample path from it (e.g. using RFF) -- you just construct a sample path from f1 and compose that with select((1, 3)). Similarly for f4 you don't actually care about its domain, you just require that sample paths for f2 and f3 operate properly on whatever data is provided.
I wonder whether this means that we actually don't care what the domain of Matern32Kernel()∘ SelectTransform((1, 3)) is, for the sake of RFF, since we would just construct an RFF approximation to the Matern32Kernel() anyway, and apply the SelectTransform((1, 3)) to the input data like usual. So maybe my original point was fine after all?
Maybe there are (at least) two ways of going about this after all...
edit: @devmotion I think my GPPP example precludes (to my mind) the idea of just providing the domain when you want when constructing the RFF to f / generating a sample path from f, because the domain of f is a strange looking beast that you certainly wouldn't want a user to have to specify -- it would involve explicitly stating what the domain is for each of the component processes (f1, ..., f4).
I'm really not sure that it would -- I think it would lead us to having to specify really complicated domains if the user is using a really complicated kernel / I'm not even sure how you would specify it for something like
I'm a bit confused. To me it feels like my suggestion (only specify the domain when constructing the RFFs) should not be more or less problematic than the DomainKernel(kernel, domain) approach - in both cases you would just have to specify the domain of the final inputs but not the domain of the intermediate constructs, with the difference that the DomainKernel couples both a kernel and a domain in one struct and otherwise it would just be two uncoupled arguments to a function that computes the RFFs. My gut feeling is that the DomainKernel should only be added if there are multiple applications for it and it would useful to be able to couple the domain and the kernel more clearly, e.g., to pass them around together - if the main goal are RFFs right now, it seems simpler to not introduce it initially and reevaluate later if there are more use cases.
Which aspects of
AutoEuclideanseem unsatisfactory?
Mainly that it is not a proper domain but a union of different domains. This seems to indicate that it is not generally useful to specify the domain of a kernel and that it would be impractical if one is forced to specify a proper domain. Which makes me wonder if one should specify domains not always as part of kernels (as in the second approach) but only if they are used in some application and defined concretely (no abstract domains such as AutoEuclidean) (as in the first approach or my suggestion when computing RFFs).
My feeling is also that it is easy to end up with inconsistent domains if one has to specify domains of intermediate kernels. And that it seems challenging and annoying (from a developer perspective) to avoid such inconsistencies at construction time.
Regarding the issue in #14, and the discussion here, my opinion would be that all this stuff should lie in an extra package linked to KernelFunctions.
This would basically contain the DomainKernel wrapper as well as the spectral_mixture definitions for a large list of kernels.
The only way I can imagine that it would not be possible to derive a kernel based on the final domain would be something like
k1 = SEKernel() k2 = Matern32Kernel() k3 = cartesian_sum(k1, k2)but then this wouldn't even be a kernel that you could evaluate for a domain with dimension > 2, since it would be ambiguous which kernel should be applied to which dimension.
I think also that providing information at the end would make pathwise sampling in GPPPs doable, albeit there's a bit of redundancy. For example, consider the process view of the above kernel:
f = @gppp let f1 = GP(with_domain(SEKernel(), Euclidean(2))) f2 = f1 ∘ select((1, 3)) f3 = GP(with_domain(SEKernel(), Euclidean(3))) f4 = f2 + f3 endby assumption we know the domain of
f1andf3because it's been specified explicitly.Interestingly, I don't think that one needs to know the dimension of the domain of
f2in order to (approximately) construct a sample path from it (e.g. using RFF) -- you just construct a sample path fromf1and compose that withselect((1, 3)). Similarly forf4you don't actually care about its domain, you just require that sample paths forf2andf3operate properly on whatever data is provided.I wonder whether this means that we actually don't care what the domain of
Matern32Kernel()∘ SelectTransform((1, 3))is, for the sake of RFF, since we would just construct an RFF approximation to theMatern32Kernel()anyway, and apply theSelectTransform((1, 3))to the input data like usual. So maybe my original point was fine after all?Maybe there are (at least) two ways of going about this after all...
edit: @devmotion I think my GPPP example precludes (to my mind) the idea of just providing the domain when you want when constructing the RFF to
f/ generating a sample path fromf, because the domain offis a strange looking beast that you certainly wouldn't want a user to have to specify -- it would involve explicitly stating what the domain is for each of the component processes (f1, ...,f4).
The big story here seems to be that for "composite" kernels built using algebraic operations applied to other kernels, it should not be necessary to specify the domain.
Instead, the domain should be something that is specified for "primitive" kernels.
In particular, I agree that a good way to think about this is to think about the underlying Gaussian processes. For instance, if we write f_1(x) + f_2(T(x)) we need not have f_1 and f_2 be defined on the same domain, they can be defined on different domains X_1 and X_2 as long as T : X_1 -> X_2 maps points into the right spaces.
On the other hand, I think it doesn't make sense to require the user to specify domains of intermediate types that arise in the context of transformed kernels when the domain can be inferred from the transformation - this seems to mostly give rise to boilerplate.
I still think a natural and simple initial step would be to specify domains separate from kernels as additional arguments if needed (e.g. for RFFs). This would be completely non-breaking and sufficient for applications that require information about the domain.
As mentioned above, if there are multiple applications (not only in theory but in practice in some implementations) with this pattern or we want to pass domains and kernels around in a more closely coupled way, then in my opinion the natural next step would be to add a DomainKernel (proposal 1 in the OP):
- Then one would not have to deal with domains in every kernel which seems to make the interface for developers more complicated even though it is not needed for many applications (and would require breaking changes but I think this is only a minor argument)
- One would also not have to deal with domains of intermediate kernels (or rely on unspecific default definitions such as
AutoEuclidean) since it would only be necessary to specify the domain of the resulting kernel. - One could still introduce arithmetic of
DomainKernels to allow to specify domains of intermediate (or even primitive) kernels if desired.