AbstractGPs.jl icon indicating copy to clipboard operation
AbstractGPs.jl copied to clipboard

Celerite methods?

Open cscherrer opened this issue 5 years ago • 8 comments

Anyone here familiar with the methods used here? It gives a linear-time exact solution for special case of one-dimensional data with covariance given by mixtures of exponentials.

The main implementation is celerite. There a full-Julia version by @ericagol here, and an alternate Julia implementation @dfm wrote to walk me through the basic ideas here.

Just thought I'd mention this, since it seems relevant if AbstractGPs is intended as a best-of-everything GP package (which... is that the case?)

cscherrer avatar Jul 05 '20 17:07 cscherrer

Also tagging @tagordon who has been working on extending celerite to multiple dimensions, and has written a version in Julia

ericagol avatar Jul 05 '20 17:07 ericagol

Ooo not come across them before. I'll add reading them to my TODO list.

AbstractGPs is more of an interface package. For example, I'm going to refactor Stheno.jl and TemporalGPs.jl to adhere to its interface. The idea being that once you've implemented the interface other stuff should work for you.

willtebbutt avatar Jul 05 '20 17:07 willtebbutt

@ericagol It would be great to collaborate on this. As @willtebbutt mentioned, AbstractGPs is meant to be a lightweight package for GP related types and core functionalities. At the same time, KernelFunctuons will provide a comprehensive set of kernel function implementations used in GP modelling. In combination with the @TuringLang libraries, the long term goal of this umbrella is to provide a viable Julia alternative to the popular GPML toolbox in Matlab.

yebai avatar Jul 27 '20 12:07 yebai

@yebai Thanks for the reply. I don't have time to help with this now, but @tagordon may be interested. The celerite formalism is fairly easy to implement, and even though it has a restriction on the functional form of the kernels, it actually provides a very complete description of stationary 1-D GPs (with a more limited extension to 2D).

ericagol avatar Jul 31 '20 21:07 ericagol

@willtebbutt I just watched your Juliacon 2020 talk, and it seems apparent to me that there is probably some commonality between the SDE approach that you use, and our celerite kernels which are also represented as sums of SDEs. The semi-separable matrix solver seemed to give better performance than a Kalman filter solver for a CARMA process (which is a more complex SDE), although this might be a function of implementation. I also found it interesting that you achieved better performance with static arrays, and that you managed to implement reverse-mode AD. This short paper might be of interest: https://ui.adsabs.harvard.edu/abs/2018RNAAS...2...31F/abstract @yebai I think it would be interesting to create a version of celerite which interfaces well with these other packages, and thus becomes more widely usable. I'm happy to advise on this (I'm not much of a programmer, but I understand the algorithm well, and have developed some basic Julia skills).

ericagol avatar Aug 02 '20 06:08 ericagol

I agree that it would be great to have a version of your celerite work that implements the AbstractGP API, which should be very doable from what I've understood from a skim of the various papers linked. Fundamentally there are two things that you need to do

  1. have a way to say "use my celerite methods instead of the defaults please", and
  2. the linear algebra primitives to exploit the structure in the celerite methods.

Probably the best way to go about this is

  1. implement a small "wrapper" type, maybe you would call it Celerite <: AbstractGP (?), that eats an AbstractGP
  2. a custom AbstractMatrix type, perhaps called SemiSeparableMatrix, parametrised in terms of A, U, and V (eqn 33 from your 2017 paper)
  3. make cov(Celerite(f(x))) spit out a SemiSeparableMatrix -- this should be the only modification needed to the AbstractGP API
  4. a custom B <: AbstractTriangular type parametrised in terms of U and W (eqn 35), with logdet and ldiv defined ( couldn't think of a better name than B, but we probably shouldn't call it B)
  5. make cholesky(::SemiSeparableMatrix) spit out a Cholesky that contains a B

If you do that, then everything should just work. I'm probably overlooking something, but I can't see anything I'm overlooking being more than a small detail.

On thing I wasn't quite able to glean from your 2017 paper why e.g. the OU process' kernel admits this semi-separable structure. Would you be able to point me towards a reference on that? Maybe I just missed where you derive it in the paper...

I agree that it's interesting that there's such a performance difference. I would be willing to bet a decent chunk of it is implementation related as it's generally quite hard to produce good Kalman filtering implementations with the tools we have available -- StaticArrays are a real life-saver in the Kalman filtering case because they accelerate the small operations nicely, but they don't do especially well for even medium-dimensional state. BLAS is fine for really high-dimensional things, but unfortunately loads of interesting problems have a medium-dimensional state space. If I've understood what you're doing correctly, it looks like there's important set of scalar calculations which compute the Cholesky decomposition, but that everything else can be done efficiently using BLAS. So that might explain the difference in general.

Skeleton Implementation

I figured a sketch of 1-5 would be helpful. It actually has an extra type because I realised while writing it that I had missed something...

using AbstractGPs
using LinearAlgebra

"""
    SemiSeparableMatrix{T<:Number} <: AbstractMatrix{T}

A matrix defined by

A + tril(U Vᵀ) + triu(V Uᵀ)

This matrix doesn't really do a lot other than getting passed around. Possibly you would
want to implement the AbstractArray interface for it so that it can be printed etc, but
equally you might not bother.
"""
struct SemiSeparableMatrix{
    T<:Number, TA<:Diagonal{T}, TU<:AbstractMatrix{T}, TV<:AbstractMatrix{T},
} <: AbstractMatrix{T}
    A::TA
    U::TU
    V::TV
    function SemiSeparableMatrix(
        A::Diagonal{T}, U::AbstractMatrix{T}, V::AbstractMatrix{T},
    ) where {T}
        some_boring_inner_constructor_stuff
    }
    end
end

"""
    Celerite <: AbstractGP

A wrapper for an `AbstractGP` that exploits some stuff...
"""
struct Celerite{Tgp} <: AbstractGP
    gp::Tgp
end

"""
Note that a `FiniteGP` just wraps the original GP `f`, input locations `x`, and noise
variance. You should think of it as the multivariate Normal given by `f` at `x`. 
"""
function AbstractGPs.cov(f::FiniteGP{<:Celerite})
    A = ...
    U = ...
    V = ...
    return SemiSeparableMatrix(A, U, V)
end

"""
    B{T} <: AbstractTriangular{T}

A upper-triangular matrix given by

I + triu(W Uᵀ)

I've assumed here that it's straightforward to consider the upper triangular convention for
the Cholesky rather than the lower, because `AbstractGPs` uses the upper-triangular
convention. If it's not in this case for reason, we can come up with a work-around.
"""
struct B{
    T<:Number, TU<:AbstractMatrix{T}, TW<:AbstractMatrix{T},
} <: LinearAlgebra.AbstractTriangular{T}
    U::TU
    W::TW
end

# It turns out that LinearAlgebra's Cholesky implementation is a bit tricky to work with
# if you're not working with dense arrays. For this reason, it's probably easier to roll
# your own here.
struct CholeskyOfSeparable{TL <: B}
    U::TU
end

function LinearAlgebra.cholesky(A::SemiSeparableMatrix)
    W = ...
    b = B(U, W)
    return CholeskyOfSeparable(B)
end

LinearAlgebra.logdet(C::CholeskyOfSeparable) = 2 * logdet_of_b

# AbstractGPs generally uses the UpperTriangular convention, rather than the lower. This
# isn't hard to handle though
function LinearAlgebra.ldiv(X::Transpose{<:Number, <:B}, Y::AbstractMatrix)
    return ...
end

willtebbutt avatar Aug 02 '20 18:08 willtebbutt

@willtebbutt That looks like a great start!

I can answer the "why" question: it is because each celerite kernel component can be written as the sums of complex exponentials. The products of exponentials involve sums of the exponents, and thus if one exponent has a dependence on e^{-c t_1} and another on e^{c t_2}, then the product will have a term with e^{-c(t_1-t_2)}. Combining these terms with the outer product of the U and V matrices then allows you to generate the kernel component of any two times t_1 and t_2. Now, if c is complex, this affords oscillatory components (although we avoid complex arithmetic). Also, to avoid numerical instability, only the exponential dependence of adjacent times are used when computing the cholesky decomposition.

This algorithm was originally a generalization of the Rybicki-Press algorithm, which has one real exponential term (i.e. an OU process), to multiple terms. Ambikasaran generalized the R-P algorithm for multiple real terms here, while in the celerite paper we applied it to complex terms, and with some prodding from me, Sivaram then derived the semi-separable approach which we then found to be about a factor of 20 times faster than the generalized Rybicki-Press algorithm, and also allowed Cholesky decomposition which enables simulation of GPs.

ericagol avatar Aug 02 '20 18:08 ericagol

Also, we've recently generalized the algorithm to a second (small) dimension which has a kernel structure which is an outer product of a vector with itself. This applies when there is a noisy process which has the same shape, but different amplitude of correlated noise, but different white noise in each component of the second dimension. The application we have in mind is stellar variability: stars vary in temperature across their surface, which leads (approximately) to the same shape of correlations across different wavelengths, but differing amplitudes. This was implemented by @tagordon in Julia.

ericagol avatar Aug 02 '20 18:08 ericagol