botorch
botorch copied to clipboard
[Feature Request] Random restart optimization for MAP estimation
🚀 Feature Request
Make it easier to perform random restart optimization for MAP estimation. Currently this isn't really exposed and would require some amount of custom code to set up. We should make this easier, and potentially even support as a first-class feature in fit_gpytorch_mll.
Motivation
See #1717
Curious if this was ever implemented? I still find the default SKLearn optimizer performs better at finding optimal hyperparameters for GPs compared to fit_gpytorch_mll or fit_gpytorch_model.
Hi @AndrewFalkowski. We haven't got around to implementing this.
I still find the default SKLearn optimizer performs better at finding optimal hyperparameters for GPs compared to fit_gpytorch_mll or fit_gpytorch_model.
Would it be possible to share a code example for us to investigate this?
I still find the default SKLearn optimizer performs better at finding optimal hyperparameters for GPs
Do you mean just the optimizer itself? Or do you mean the actual GP model in sklearn?
Thanks for the timely responses! I was going through my code and found that the data was being scaled in an odd way between iterations and throwing off my hyperparameter optimization - seems to be comparable to what I was seeing with SKLearn now.
This was implemented in https://github.com/pytorch/botorch/pull/2373
Hi @saitcakmak thanks for your effort.
I'm currently using this feature, but when I analyzed the code behind it in _fit_fallback() i could not find a line where different starting point for the optimization are selected.
So basically, for every attempt it runs: optimizer(mll, closure=closure, **optimizer_kwargs) . That means that after every loop it optimizes the last optimized mll.
Am i correct? If so, would it not be more beneficial if one optimizes the parameters from different starting points?
I have seen models that uses for example differential evolution alg. to optimize the log-likelihood to make sure to find the global optimum, since this function is non-convex. And they in fact achieve much better results in comparison to botorch . Since we are using here L-BFGS it could be helpful to optimize from different starting positions.
Hi @Mustafaessou. This line samples new starting points from the priors of each model hyper parameter in subsequent model fitting attempts. That way, we achieve multi-start optimization when pick_best_of_all_attempts=True.