AbstractGPs.jl
AbstractGPs.jl copied to clipboard
Some usability suggestions
Taken from this issue in AugmentedGaussianProcesses.jl:
There seems to be no simple way of accessing observations in a PosteriorGP except via data.x and data.δ
As there is a standard API for creating a PosteriorGP it would make sense to also provide one for retrieving its contents. How about getX
and gety
?
rand(gp, x)
as an alternative torand(gp(x))
similar to mean, cov etc.convert(Normal, gp, x)
orconvert(MvNormal, gp, X)
would be convenient
As FiniteGP isa AbstractMvNormal
the last point is probably not relevant.
As an addition I should note that while AbstractGP defines itself as a more low-level interface I was wondering if there wouldn't be some value in providing some more "high-level" functionality for updating a GP with new training points (like a push! functionality) and changing kernel parameters (for hyperparameter tuning). Both of these tasks can be achieved at the moment but it's a bit ugly.
I guess a reason for why there is no API for destructuring a PosteriorGP
is that it is not part of the public API:
This functionality is not intended to be used directly by the users, or those building functionality on top of this package – they should interact with Primary Public API.
I am also unsure about rand
, to me it seems rand(gp, x)
would be neither consistent with Random nor with Distributions. But I guess the same can be said about mean(gp, x)
, even though also this function is not part of the public API.
I see! Looks like somebody should have read the documentation more carefully. In any case it doesn't look like the public API is yet sufficient for implementing something like Bayesian Optimisation... I guess functions like mean(gp, x)
are useful to avoid computing variances if only the mean is required, which doesn't apply to rand
. I am still a bit unsure about parts of AbstractGPs so please bear with me!
Thanks for opening an issue to discuss this -- very keen to have feedback and discuss the design with as many people as possible.
some more "high-level" functionality for updating a GP with new training points
On this particular point, I would note that you can call posterior
on a PosteriorGP
, and it does indeed just update with the new data points and provide a new PosteriorGP
.
and changing kernel parameters (for hyperparameter tuning)
This is intentionally something that hasn't been built into AbstractGPs.jl. My personal view is that things like ParameterHandling.jl in conjunction with Zygote.jl and Optim.jl make life sufficiently straightforward ~~so as not to bother~~ to make a high-level API less important. For example, this example in TemporalGPs.jl shows how the packages play nicely together.
That being said, I can see the appeal of having a scikit-learn like interface built on top of these tools that's quite opinionated 🤷 I don't personally feel that AbstractGPs is the right place for something like that though.
In any case it doesn't look like the public API is yet sufficient for implementing something like Bayesian Optimisation
Could you elaborate a bit more on this? I'd be keen to know what you feel is lacking.
edit: actually, this version of the TemporalGPs example is probably better:
# Load our GP-related packages.
using AbstractGPs
using KernelFunctions
using TemporalGPs
# Load standard packages from the Julia ecosystem
using Optim # Standard optimisation algorithms.
using ParameterHandling # Helper functionality for dealing with model parameters.
using Zygote # Algorithmic Differentiation
using ParameterHandling: flatten
# Declare model parameters using `ParameterHandling.jl` types.
flat_initial_params, unflatten = flatten((
var_kernel = positive(0.6),
λ = positive(2.5),
var_noise = positive(0.1),
))
# Construct a function to unpack flattened parameters and pull out the raw values.
unpack = ParameterHandling.value ∘ unflatten
params = unpack(flat_initial_params)
function build_gp(params)
f_naive = GP(params.var_kernel * Matern52Kernel() ∘ ScaleTransform(params.λ))
return to_sde(f_naive, SArrayStorage(Float64))
end
# Generate some synthetic data from the prior.
const x = RegularSpacing(0.0, 0.1, 10_000)
const y = rand(build_gp(params)(x, params.var_noise))
# Specify an objective function for Optim to minimise in terms of x and y.
# We choose the usual negative log marginal likelihood (NLML).
function objective(params)
f = build_gp(params)
return -logpdf(f(x, params.var_noise), y)
end
# Check that the objective function works:
objective(params)
# Optimise using Optim. This optimiser often works fairly well in practice,
# but it's not going to be the best choice in all situations. Consult
# Optim.jl for more info on available optimisers and their properties.
training_results = Optim.optimize(
objective ∘ unpack,
θ -> only(Zygote.gradient(objective ∘ unpack, θ)),
flat_initial_params + randn(3), # Add some noise to make learning non-trivial
BFGS(
alphaguess = Optim.LineSearches.InitialStatic(scaled=true),
linesearch = Optim.LineSearches.BackTracking(),
),
Optim.Options(show_trace = true);
inplace=false,
)
# Extracting the final values of the parameters.
# Should be close to truth.
final_params = unpack(training_results.minimizer)
trained_gp = build_gp(final_params)
trained_posterior = posterior(trained_gp(x, final_params.var_noise), y)
# Make whatever predictions you care to make.
edit2: to be fair, I find myself copy+pasting the above quite a bit, so probably some kind of opinionated high-level interface that is just a lightweight wrapper around it would be a good idea.
The issue with another interface is that we might end up with three layers of abstraction with very little underneath. For now AbstractGP's purpose seems quite clear and we should probably focus on doing that well.
This is intentionally something that hasn't been built into AbstractGPs.jl. My personal view is that things like ParameterHandling.jl in conjunction with Zygote.jl and Optim.jl make life sufficiently straightforward so as not to bother to make a high-level API less important. For example, this example in TemporalGPs.jl shows how the packages play nicely together.
I fully second this. Some kind of consistent interface like that in Flux (with Flux.params
, Flux.trainable
) would be a nice goal to strive towards but as long as it's easy and clearly documented we'll be fine. I agree that the above example is maybe a bit too manual but it's serviceable and any improvements would probably belong to ParameterHandling.jl. Something like @functor
for parameters maybe?
Could you elaborate a bit more on this? I'd be keen to know what you feel is lacking.
So far the main thing I have come across is accessing training data for a GP. I will try to put up a tentative Bayesian Optimisation demo soon so we can see how well things work together.
I fully second this. Some kind of consistent interface like that in Flux (with Flux.params, Flux.trainable) would be a nice goal to strive towards but as long as it's easy and clearly documented we'll be fine. I agree that the above example is maybe a bit too manual but it's serviceable and any improvements would probably belong to ParameterHandling.jl. Something like @functor for parameters maybe?
Glad you agree! Regarding @functor
etc, my view is that it's just an orthogonal parameter handling strategy to the one adopted by ParameterHandling.jl
. The former is very much a "build a model and mutate it" strategy, whereas ParameterHandling.jl natural lends itself to a more immutable approach to handling models and objective functions etc.
The issue with another interface is that we might end up with three layers of abstraction with very little underneath.
Yeah -- I definitely don't want to have it in this package. I was imagining writing a small glue package that implements MLJ's (or whichever your favourite scikit-learn-like package is) interface using this packages tooling. The whole package would probably only be a couple of hundred lines of code or something, and wouldn't provide the flexibility that is availlable here, but could make life easier in some situations.
So far the main thing I have come across is accessing training data for a GP. I will try to put up a tentative Bayesian Optimisation demo soon so we can see how well things work together.
Interesting -- I very much look forward to seeing this! I like your proposal above about adding some helper functionality to access training data, and having a concrete example to work backwards from would be very helpful.
Closing as this has gone stale. Please re-open if needed.