KernelFunctions.jl icon indicating copy to clipboard operation
KernelFunctions.jl copied to clipboard

`PeriodicKernel` does not work with AD (see issue for work-around)

Open david-vicente opened this issue 4 years ago • 10 comments

Hi. I noticed that the PeriodicKernel is failling AD tests and I would like to implement the Mauna Loa GPR example. I tried looking for discussion regarding this issue in past Issues but there isn't much. Is the problem an instance of "it is doable but no one has had the time yet" or is it pending because of stuff that depends on AD packages? Thanks!

david-vicente avatar Oct 23 '21 11:10 david-vicente

I just reran the AD tests and it looks like using ForwardDiff or ReverseDiff is fine. However there are still some strong errors with Zygote. I think #386 should hopefully solve this problem.

Note that our PeriodicKernel is only equal to the ExpSineKernel from sci-kit learn for one dimension : vs

theogf avatar Oct 25 '21 08:10 theogf

Why isn't there a period length parameter as in gpytorch?

david-vicente avatar Oct 25 '21 12:10 david-vicente

Because this would just be the lengthscale. You could achieve the same result with:

p = [1.0, 2.0, 1.5]
k = with_lengthscale(PeriodicKernel(;r=rand(3)), p)

where p is the same as in the gpytorch definition

The r here, corresponds to the sqrt(lambda) in gpytorch

theogf avatar Oct 25 '21 14:10 theogf

Oh, I was independently trying to build the MaunaLoa example (I put the work-in-progress here: https://github.com/JuliaGaussianProcesses/EcosystemIntegrationTesting/blob/main/api_testing/mauna_loa_example/script.jl as a Pluto.jl notebook) and ran into the same bug.

Here's code to reproduce the bug:

using AbstractGPs
using ParameterHandling
function build_gp(th)
    return GP(th.s^2 * with_lengthscale(PeriodicKernel(; r=[th.l/2]), th.p))
end
th = (; s = positive(exp(0.0)), l=positive(exp(1.0)), p=positive(exp(0.0)))
thflat, unflatten = ParameterHandling.value_flatten(th)
x = rand(5)
y = rand(5)
function loss(th)
    f=build_gp(th)
    return -logpdf(f(x, 0.01), y)
end
loss_packed = loss ∘ unflatten
using Zygote
Zygote.gradient(loss_packed, thflat)

st-- avatar Nov 05 '21 13:11 st--

@david-vicente the following two lines give equivalent kernels (in 1D):

k1 = with_lengthscale(PeriodicKernel(; r=[wiggle_scale / 2]), period)
k2 = with_lengthscale(SqExponentialKernel(), wiggle_scale) ∘ PeriodicTransform(1/period)

and we might want to provide with_period(kernel, period) = kernel ∘ PeriodicTransform(1/period). AutoDiff works fine for the PeriodicTransform. Using the second form, you can also swap out the base kernel, e.g. for a Matern32Kernel:)

(Note that wiggle_scale is relative to a period, not relative to the actual input scale!)

st-- avatar Nov 05 '21 14:11 st--

Thank you @st--. I did look to the PeriodicTransform at the time, but it wasn't obvious how to use it to achieve equivalent kernels from the documentation. Maybe we should add

k1 = with_lengthscale(PeriodicKernel(; r=[wiggle_scale / 2]), period)
k2 = with_lengthscale(SqExponentialKernel(), wiggle_scale) ∘ PeriodicTransform(1/period)

to its documentation?

david-vicente avatar Nov 05 '21 14:11 david-vicente

@david-vicente I've just added the finished example to AbstractGPs: https://github.com/JuliaGaussianProcesses/AbstractGPs.jl/pull/240

Yes it'd be great to improve the documentation. Would you be up for making the change and opening a PR? :)

st-- avatar Nov 05 '21 14:11 st--

You could also add the with_periodic helper I suggested, if you wanted:)

st-- avatar Nov 05 '21 14:11 st--

Can we do the same for multi-dimensional inputs?

david-vicente avatar Nov 05 '21 21:11 david-vicente

Can we do the same for multi-dimensional inputs?

It would work in exactly the same way if we would make PeriodicTransform output complex numbers (using cispi and, on older Julia versions, cis).

devmotion avatar Nov 09 '21 16:11 devmotion

Fixed by #531

simsurace avatar Feb 10 '24 14:02 simsurace