botorch icon indicating copy to clipboard operation
botorch copied to clipboard

[Feature Request] Random restart optimization for MAP estimation

Open Balandat opened this issue 2 years ago • 4 comments

🚀 Feature Request

Make it easier to perform random restart optimization for MAP estimation. Currently this isn't really exposed and would require some amount of custom code to set up. We should make this easier, and potentially even support as a first-class feature in fit_gpytorch_mll.

Motivation

See #1717

Balandat avatar Mar 04 '23 20:03 Balandat

Curious if this was ever implemented? I still find the default SKLearn optimizer performs better at finding optimal hyperparameters for GPs compared to fit_gpytorch_mll or fit_gpytorch_model.

AndrewFalkowski avatar Oct 25 '23 18:10 AndrewFalkowski

Hi @AndrewFalkowski. We haven't got around to implementing this.

I still find the default SKLearn optimizer performs better at finding optimal hyperparameters for GPs compared to fit_gpytorch_mll or fit_gpytorch_model.

Would it be possible to share a code example for us to investigate this?

saitcakmak avatar Oct 26 '23 21:10 saitcakmak

I still find the default SKLearn optimizer performs better at finding optimal hyperparameters for GPs

Do you mean just the optimizer itself? Or do you mean the actual GP model in sklearn?

Balandat avatar Oct 29 '23 18:10 Balandat

Thanks for the timely responses! I was going through my code and found that the data was being scaled in an odd way between iterations and throwing off my hyperparameter optimization - seems to be comparable to what I was seeing with SKLearn now.

AndrewFalkowski avatar Oct 31 '23 15:10 AndrewFalkowski

This was implemented in https://github.com/pytorch/botorch/pull/2373

saitcakmak avatar Jul 24 '24 18:07 saitcakmak

Hi @saitcakmak thanks for your effort. I'm currently using this feature, but when I analyzed the code behind it in _fit_fallback() i could not find a line where different starting point for the optimization are selected. So basically, for every attempt it runs: optimizer(mll, closure=closure, **optimizer_kwargs) . That means that after every loop it optimizes the last optimized mll. Am i correct? If so, would it not be more beneficial if one optimizes the parameters from different starting points?

I have seen models that uses for example differential evolution alg. to optimize the log-likelihood to make sure to find the global optimum, since this function is non-convex. And they in fact achieve much better results in comparison to botorch . Since we are using here L-BFGS it could be helpful to optimize from different starting positions.

Mustafaessou avatar Jan 10 '25 10:01 Mustafaessou

Hi @Mustafaessou. This line samples new starting points from the priors of each model hyper parameter in subsequent model fitting attempts. That way, we achieve multi-start optimization when pick_best_of_all_attempts=True.

saitcakmak avatar Jan 10 '25 15:01 saitcakmak