KernelFunctions.jl
KernelFunctions.jl copied to clipboard
Sum of independent kernels
Following this discourse discussion. Currently, there is no building block to sum independent Kernels, analog to KernelTensorProduct
but with addition instead of multiplication:
For inputs $x = (x_1,\dots,x_n)$ and $x' = (x_1',\dots,x_n')$, the independent sum of kernels $k_1, \dots, k_n$:
$$ k(x, x'; k_1, \dots k_n) = \sum_{i=1}^n k_i(x_i, x_i') $$
It sounds like a reasonable composite to add, especially since the alternative is pretty ugly using SelectTransform
, is there a standardized name for this kind of kernels? KernelTensorSum
? KernelDimensionwiseSum
?
I am willing to do a PR, but I'll need some guidance since it is my first contribution. My naive approach would be to create a new Kernel (I like the idea of KernelTensorSum
) similar to KernelTensorProduct
.
What are the requirements for a fully functional kernel that can be used in AbstractGPs? From the documentation I identify the following:
- define struct + constructors (would an abstract type
KernelTensor
for bothKernelTensorProduct
andKernelTensorSum
make sense?) - define kernel (basically adapting the following method: https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/ef6d4591b36194fca069d8bc7ae8c1e2ee288080/src/kernels/kerneltensorproduct.jl#L52C5-L58)
- define (or reuse)
dim
method - define
kernelmatrix
method - pretty printing
- tests
- Am I missing something?
Thanks for contributing, I think you got it all together!
I would not build a KernelTensor
abstraction as I don't think we would get much out of it.
For the name we can still change it during the PR review time if some other arguments come up.
What are the requirements for a fully functional kernel that can be used in AbstractGPs?
I guess you can mainly copy KernelTensorProduct
and replace multiplication with addition.
would an abstract type KernelTensor for both KernelTensorProduct and KernelTensorSum make sense?
Not in an initial version IMO (and maybe not at all). I would add a separate type, similar to how we distinguish between KernelSum
and KernelProduct
.
One technical question. Is it always the case that if you compare the same input, the correlation should be 1?
julia> k = SqExponentialKernel();
julia> x = 0;
julia> k(x,x)
1.0
Independently adding the kernels results into the following behavior:
julia> k1 = SqExponentialKernel();
julia> k2 = ExponentialKernel();
julia> k = KernelTensorSum(k1, k2)
Tensor sum of 2 kernels:
Squared Exponential Kernel (metric = Distances.Euclidean(0.0))
Exponential Kernel (metric = Distances.Euclidean(0.0))
julia> x = zeros(2);
julia> k(x,x)
2.0
So, should the kernel take the mean
instead of sum
so the correlation is normalized?
For inputs $x = (x_1,\dots,x_n)$ and $x' = (x_1',\dots,x_n')$, the independent sum of kernels $k_1, \dots, k_n$:
$$ k(x, x'; k_1, \dots k_n) = \frac{1}{n} \sum_{i=1}^n k_i(x_i, x_i') $$
No it does not have to be! I would not "normalize" cause that might be something unexpected from the user side. The scaling should be dealt with each kernel individually.
Of course... Simply scaling a kernel would also mean k(x,x) != 1.0