BayesianOptimization.jl icon indicating copy to clipboard operation
BayesianOptimization.jl copied to clipboard

Full Bayesian approach to hyperparameters

Open platawiec opened this issue 6 years ago • 2 comments

Using the full posterior distribution with the hyperparameters as unknown variables is known to give better results in Bayesian optimization (see https://arxiv.org/pdf/1206.2944.pdf).

A user could opt-in to using this technique by replacing the MAPGPOptimizer with MCMCEstimate or another appropriate name. GaussianProcesses provides an mcmc function to estimate hyperparameters but my understanding of the source is that it does not marginalize over the hyperparameters and compute an integrated acquisition function (which I suppose wouldn't make sense within the scope of GaussianProcesses).

Thoughts on including something like this? The way I see it, the work would break down as follows:

  • [ ] Include benchmarks
  • [ ] Decide on MCMC implementation (do we introduce additional dependencies, write in-line code, etc.)
  • [ ] Decide on interface for various acquisition functions under MCMCEstimate
  • [ ] Implement prototype
  • [ ] Compare against tests/benchmarks

platawiec avatar May 01 '19 16:05 platawiec

This would be super cool to have! Do you want to work on a PR? I did not carefully think about it (nor test the code below), but would it maybe make sense to define a new type of model

struct MonteCarloGP{M,P}
    model::M
    hyperparameters::P
end

where model holds the GP object and hyperparameters holds the return value of GaussianProcesses.mcmc?

One could then specialize the acquisition function

function acquisitionfunction(a, model::MonteCarloGP)
    x -> begin
        result = 0.
        for hyperparameter in model.hyperparameters
            setparams!(model, hyperparameter)
            μ, σ² = mean_var(model, x)
            result += a.(μ, σ²)
        end
        result /= length(model.hyperparameters)
    end
end

jbrea avatar May 10 '19 08:05 jbrea

In terms of extended functionality, #9 is a bit higher priority for me so I will tackle that first. I'll keep thinking on this one, the paper suggests using slice sampling MCMC. I'll have to take a deeper look at the MCMC implementations available to determine if I should roll my own for this case.

I think that defining a new type of model will be the way to go, that will be minimally obtrusive to the current implementation.

platawiec avatar May 11 '19 19:05 platawiec