KernelFunctions.jl Spectral distribution and domain of kernel input

I'd like to have a function spectral_distribution which takes a kernel defined over R^d and returns in spectral distribution. For example

function spectral_distribution(k::SqExponentialKernel)
  MvNormal(dim(k), ones(dim(k)))
end

However, there is one big API incompatibility with such a method: currently, there is no way to determine what domain a KernelFunctions.jl kernel is defined over. In particular, there's no way to tell whether a given kernel is defined over R^1 or R^3 or R^100. The underlying issue is that Distances.jl does not distinguish this, and KernelFunctions.jl largely works on top of that API. This is also an issue for #9 and #10.

I suggest instead adding the domain as an explicit parameter to the kernel.

Dec 17 '19 20:12 aterenin

I'm not sure that we need to add this additional complexity to the kernels themselves. Would it not be sufficient to make spectral_distribution binary, with the second argument being the domain?

Dec 19 '19 22:12 willtebbutt

Sorry to revive this, but having access to the spectral density of stationary kernels would be a great feature.

Edit: here is a very nice use case: https://arxiv.org/abs/1401.5508

Nov 29 '21 22:11 mvsoom

Your timing is impeccable: I literally just wrote the following code.

spectral_measure(k::ScaledKernel) = spectral_measure(k.kernel)
spectral_measure(k::TransformedKernel{<:T, <:ScaleTransform} where T) = k.transform.s[1] * spectral_measure(k.kernel)
spectral_measure(k::SqExponentialKernel) = Normal(0, 1 / (2*pi))

The main difficulty so far seems to be that the second function only works for 1D kernels due to the [1] in there, and I'm not quite sure how to handle the need to know the dimension of the space in order to specify the spectral measure in order to make this general.

Maybe some kind of wrapper that optionally adds this information to the kernel class, like EuclideanKernel{K,D} where K is a kernel and D is an integer which indicates dimension? This should just work when plugged in to the current API, and gives users the ability to add this information if they need it. On the other hand, if one uses TransformedKernel the transformation itself could have the information needed in it to infer the dimension, albeit not from the type system.

@willtebbutt @theogf Maybe it's worth revisiting this now that the package has evolved further?

Nov 29 '21 22:11 aterenin

Speaking of the domain, there is a larger discussion here: https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/issues/382

Nov 29 '21 22:11 theogf

I only need the 1D case, so based on your code I did the following:

# Spectral density for 1D covariance functions
# Using the FT convention in Eq. (4.6) in Rasmussen & Williams (2006)
spectral_density(k::ScaledKernel, s) = k.σ²[1] * spectral_density(k.kernel, s)
function spectral_density(k::TransformedKernel{<:T, <:ScaleTransform} where T, s)
λ = inv(k.transform.s[1])
λ*spectral_density(k.kernel, λ*s)
end

spectral_density(k::SqExponentialKernel, s) = √(2π)*exp(-2π^2*s^2)
spectral_density(k::Matern12Kernel, s) = 2/(1+4π^2*s^2)
spectral_density(k::Matern32Kernel, s) = 12*√3/(3+4π^2*s^2)^2
spectral_density(k::Matern52Kernel, s) = 400*√5/(3*(5+4π^2*s^2)^3)

But making it into a measure like Normal would be very nice indeed, although I'm not sure most distributions associated with stationary kernels are easy to sample.

Nov 29 '21 23:11 mvsoom

I only need the 1D case, so based on your code I did the following:

# Spectral density for 1D covariance functions
# Using the FT convention in Eq. (4.6) in Rasmussen & Williams (2006)
spectral_density(k::ScaledKernel, s) = k.σ²[1] * spectral_density(k.kernel, s)
function spectral_density(k::TransformedKernel{<:T, <:ScaleTransform} where T, s)
λ = inv(k.transform.s[1])
λ*spectral_density(k.kernel, λ*s)
end

spectral_density(k::SqExponentialKernel, s) = √(2π)*exp(-2π^2*s^2)
spectral_density(k::Matern12Kernel, s) = 2/(1+4π^2*s^2)
spectral_density(k::Matern32Kernel, s) = 12*√3/(3+4π^2*s^2)^2
spectral_density(k::Matern52Kernel, s) = 400*√5/(3*(5+4π^2*s^2)^3)

But making it into a measure like Normal would be very nice indeed, although I'm not sure most distributions associated with stationary kernels are easy to sample.

Not in complete generality, but they are known for the widely-used Matérn class: the (normalized) spectral measure is a student T distribution whose degrees of freedom parameter corresponds to the smoothness of the kernel. One can also get results for most other classes used in practice, and I would guess for almost everything widely-used enough to make it into a package.

I'm adopting a normalization convention here: since every spectral measure \mu is a finite measure, one can represent it as a pair (a, \rho) where a = k(0,0) = \int_{R^d} \d\mu the amplitude of the kernel, and \rho is a probability measure. I would strongly argue in favor of this convention for any package, because enables one to share functionality such as sampling from the (normalized) spectral measure, a very common use case for random Fourier feature approaches, with the rest of the ecosystem.

Nov 30 '21 00:11 aterenin

Yes, the normalization is indeed a nice idea. I was thinking about random Fourier features as well.

One can also get results for most other classes used in practice, and I would guess for almost everything widely-used enough to make it into a package.

Surprisingly, the spectral densities for GammaExponentialKernel and RationalQuadraticKernel are not as straightforward as I thought:

# Analytical transform unknown by Mathematica
spectral_density(k::GammaExponentialKernel, s) = nothing

function spectral_density(k::RationalQuadraticKernel, s)
  # TODO: Work in log domain
  # TODO: Fix singularity at `s = 0`
  α = k.α[1]
  2^(5/4+α/2)*α^(1/4+α/2)*π^α*abs(s)^(-1/2+α)*besselk(-1/2+α, 2π*√(2*α)*abs(s))/gamma(α)
end

So another benefit of using normalized measures is that you can use logpdf for cases such as RationalQuadraticKernel.

Nov 30 '21 00:11 mvsoom

Yes, the normalization is indeed a nice idea. I was thinking about random Fourier features as well.

One can also get results for most other classes used in practice, and I would guess for almost everything widely-used enough to make it into a package.

Surprisingly, the spectral densities for GammaExponentialKernel and RationalQuadraticKernel are not as straightforward as I thought:
# Analytical transform unknown by Mathematica
spectral_density(k::GammaExponentialKernel, s) = nothing

function spectral_density(k::RationalQuadraticKernel, s)
  # TODO: Work in log domain
  # TODO: Fix singularity at `s = 0`
  α = k.α[1]
  2^(5/4+α/2)*α^(1/4+α/2)*π^α*abs(s)^(-1/2+α)*besselk(-1/2+α, 2π*√(2*α)*abs(s))/gamma(α)
end
So another benefit of using normalized measures is that you can use logpdf for cases such as RationalQuadraticKernel.

For the gamma exponential - known to me as the power exponential family, which I believe is only PSD for \gamma < 1 in that parametrization (if I'm remembering that right this ought to be documented before it explodes when someone puts in a bigger value) - Mathematica might be failing to get that one due to the necessary restrictions on the exponent - all if I'm remembering correctly, of course! It's probably worth checking Gradshteyn and Ryzhik to see if anything useful appears there.

The other one is essentially a student T form, which explains the Bessel functions you're seeing pop out. If someone somehow calculates and inverts the CDF of that thing, one could get a sampling regime for it via quantile transform - or maybe there's a more clever thing one could do.

Nov 30 '21 01:11 aterenin

KernelFunctions.jl KernelFunctions.jl copied to clipboard

Spectral distribution and domain of kernel input

KernelFunctions.jl
KernelFunctions.jl copied to clipboard