BayesianOptimization.jl
BayesianOptimization.jl copied to clipboard
Full Bayesian approach to hyperparameters
Using the full posterior distribution with the hyperparameters as unknown variables is known to give better results in Bayesian optimization (see https://arxiv.org/pdf/1206.2944.pdf).
A user could opt-in to using this technique by replacing the MAPGPOptimizer with MCMCEstimate or another appropriate name. GaussianProcesses provides an mcmc function to estimate hyperparameters but my understanding of the source is that it does not marginalize over the hyperparameters and compute an integrated acquisition function (which I suppose wouldn't make sense within the scope of GaussianProcesses).
Thoughts on including something like this? The way I see it, the work would break down as follows:
- [ ] Include benchmarks
- [ ] Decide on MCMC implementation (do we introduce additional dependencies, write in-line code, etc.)
- [ ] Decide on interface for various acquisition functions under
MCMCEstimate - [ ] Implement prototype
- [ ] Compare against tests/benchmarks
This would be super cool to have! Do you want to work on a PR? I did not carefully think about it (nor test the code below), but would it maybe make sense to define a new type of model
struct MonteCarloGP{M,P}
model::M
hyperparameters::P
end
where model holds the GP object and hyperparameters holds the return value of GaussianProcesses.mcmc?
One could then specialize the acquisition function
function acquisitionfunction(a, model::MonteCarloGP)
x -> begin
result = 0.
for hyperparameter in model.hyperparameters
setparams!(model, hyperparameter)
μ, σ² = mean_var(model, x)
result += a.(μ, σ²)
end
result /= length(model.hyperparameters)
end
end
In terms of extended functionality, #9 is a bit higher priority for me so I will tackle that first. I'll keep thinking on this one, the paper suggests using slice sampling MCMC. I'll have to take a deeper look at the MCMC implementations available to determine if I should roll my own for this case.
I think that defining a new type of model will be the way to go, that will be minimally obtrusive to the current implementation.