KernelFunctions.jl
KernelFunctions.jl copied to clipboard
Spectral distribution and domain of kernel input
I'd like to have a function spectral_distribution
which takes a kernel defined over R^d and returns in spectral distribution. For example
function spectral_distribution(k::SqExponentialKernel)
MvNormal(dim(k), ones(dim(k)))
end
However, there is one big API incompatibility with such a method: currently, there is no way to determine what domain a KernelFunctions.jl kernel is defined over. In particular, there's no way to tell whether a given kernel is defined over R^1 or R^3 or R^100. The underlying issue is that Distances.jl does not distinguish this, and KernelFunctions.jl largely works on top of that API. This is also an issue for #9 and #10.
I suggest instead adding the domain as an explicit parameter to the kernel.
I'm not sure that we need to add this additional complexity to the kernels themselves. Would it not be sufficient to make spectral_distribution
binary, with the second argument being the domain?
Sorry to revive this, but having access to the spectral density of stationary kernels would be a great feature.
Edit: here is a very nice use case: https://arxiv.org/abs/1401.5508
Your timing is impeccable: I literally just wrote the following code.
spectral_measure(k::ScaledKernel) = spectral_measure(k.kernel)
spectral_measure(k::TransformedKernel{<:T, <:ScaleTransform} where T) = k.transform.s[1] * spectral_measure(k.kernel)
spectral_measure(k::SqExponentialKernel) = Normal(0, 1 / (2*pi))
The main difficulty so far seems to be that the second function only works for 1D kernels due to the [1]
in there, and I'm not quite sure how to handle the need to know the dimension of the space in order to specify the spectral measure in order to make this general.
Maybe some kind of wrapper that optionally adds this information to the kernel class, like EuclideanKernel{K,D}
where K
is a kernel and D
is an integer which indicates dimension? This should just work when plugged in to the current API, and gives users the ability to add this information if they need it. On the other hand, if one uses TransformedKernel
the transformation itself could have the information needed in it to infer the dimension, albeit not from the type system.
@willtebbutt @theogf Maybe it's worth revisiting this now that the package has evolved further?
Speaking of the domain, there is a larger discussion here: https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/issues/382
I only need the 1D case, so based on your code I did the following:
# Spectral density for 1D covariance functions
# Using the FT convention in Eq. (4.6) in Rasmussen & Williams (2006)
spectral_density(k::ScaledKernel, s) = k.σ²[1] * spectral_density(k.kernel, s)
function spectral_density(k::TransformedKernel{<:T, <:ScaleTransform} where T, s)
λ = inv(k.transform.s[1])
λ*spectral_density(k.kernel, λ*s)
end
spectral_density(k::SqExponentialKernel, s) = √(2π)*exp(-2π^2*s^2)
spectral_density(k::Matern12Kernel, s) = 2/(1+4π^2*s^2)
spectral_density(k::Matern32Kernel, s) = 12*√3/(3+4π^2*s^2)^2
spectral_density(k::Matern52Kernel, s) = 400*√5/(3*(5+4π^2*s^2)^3)
But making it into a measure like Normal
would be very nice indeed, although I'm not sure most distributions associated with stationary kernels are easy to sample.
I only need the 1D case, so based on your code I did the following:
# Spectral density for 1D covariance functions # Using the FT convention in Eq. (4.6) in Rasmussen & Williams (2006) spectral_density(k::ScaledKernel, s) = k.σ²[1] * spectral_density(k.kernel, s) function spectral_density(k::TransformedKernel{<:T, <:ScaleTransform} where T, s) λ = inv(k.transform.s[1]) λ*spectral_density(k.kernel, λ*s) end spectral_density(k::SqExponentialKernel, s) = √(2π)*exp(-2π^2*s^2) spectral_density(k::Matern12Kernel, s) = 2/(1+4π^2*s^2) spectral_density(k::Matern32Kernel, s) = 12*√3/(3+4π^2*s^2)^2 spectral_density(k::Matern52Kernel, s) = 400*√5/(3*(5+4π^2*s^2)^3)
But making it into a measure like
Normal
would be very nice indeed, although I'm not sure most distributions associated with stationary kernels are easy to sample.
Not in complete generality, but they are known for the widely-used Matérn class: the (normalized) spectral measure is a student T distribution whose degrees of freedom parameter corresponds to the smoothness of the kernel. One can also get results for most other classes used in practice, and I would guess for almost everything widely-used enough to make it into a package.
I'm adopting a normalization convention here: since every spectral measure \mu
is a finite measure, one can represent it as a pair (a, \rho)
where a = k(0,0) = \int_{R^d} \d\mu
the amplitude of the kernel, and \rho
is a probability measure. I would strongly argue in favor of this convention for any package, because enables one to share functionality such as sampling from the (normalized) spectral measure, a very common use case for random Fourier feature approaches, with the rest of the ecosystem.
Yes, the normalization is indeed a nice idea. I was thinking about random Fourier features as well.
One can also get results for most other classes used in practice, and I would guess for almost everything widely-used enough to make it into a package.
Surprisingly, the spectral densities for GammaExponentialKernel
and RationalQuadraticKernel
are not as straightforward as I thought:
# Analytical transform unknown by Mathematica
spectral_density(k::GammaExponentialKernel, s) = nothing
function spectral_density(k::RationalQuadraticKernel, s)
# TODO: Work in log domain
# TODO: Fix singularity at `s = 0`
α = k.α[1]
2^(5/4+α/2)*α^(1/4+α/2)*π^α*abs(s)^(-1/2+α)*besselk(-1/2+α, 2π*√(2*α)*abs(s))/gamma(α)
end
So another benefit of using normalized measures is that you can use logpdf
for cases such as RationalQuadraticKernel
.
Yes, the normalization is indeed a nice idea. I was thinking about random Fourier features as well.
One can also get results for most other classes used in practice, and I would guess for almost everything widely-used enough to make it into a package.
Surprisingly, the spectral densities for
GammaExponentialKernel
andRationalQuadraticKernel
are not as straightforward as I thought:# Analytical transform unknown by Mathematica spectral_density(k::GammaExponentialKernel, s) = nothing function spectral_density(k::RationalQuadraticKernel, s) # TODO: Work in log domain # TODO: Fix singularity at `s = 0` α = k.α[1] 2^(5/4+α/2)*α^(1/4+α/2)*π^α*abs(s)^(-1/2+α)*besselk(-1/2+α, 2π*√(2*α)*abs(s))/gamma(α) end
So another benefit of using normalized measures is that you can use
logpdf
for cases such asRationalQuadraticKernel
.
For the gamma exponential - known to me as the power exponential family, which I believe is only PSD for \gamma < 1 in that parametrization (if I'm remembering that right this ought to be documented before it explodes when someone puts in a bigger value) - Mathematica might be failing to get that one due to the necessary restrictions on the exponent - all if I'm remembering correctly, of course! It's probably worth checking Gradshteyn and Ryzhik to see if anything useful appears there.
The other one is essentially a student T form, which explains the Bessel functions you're seeing pop out. If someone somehow calculates and inverts the CDF of that thing, one could get a sampling regime for it via quantile transform - or maybe there's a more clever thing one could do.