BayesianOptimization icon indicating copy to clipboard operation
BayesianOptimization copied to clipboard

Support high-dimensional BO

Open jacktang opened this issue 1 year ago • 15 comments

Is your feature request related to a problem? Please describe. Bayesian optimization for high-dimensional constraint problem

Describe the solution you'd like The solution called Sparse Axis-Aligned Subspace BO (SAASBO) is described in the paper: https://arxiv.org/abs/2103.00349

References or alternative approaches

  • https://arxiv.org/abs/2103.00349

Are you able and willing to implement this feature yourself and open a pull request?

  • [x] Yes, I would like to try to implement it

jacktang avatar Jul 15 '24 07:07 jacktang

Hi @jacktang,

I haven't had the time to read this paper, but considering that this is essentially an acquisition strategy, do you think you could implement this simply by adding a new acquisition function?

till-m avatar Jul 15 '24 07:07 till-m

Hello @till-m

Yes, I am going to implement it the way you suggested. And I noticed that you had merge the acquisition function branch to master. Great job! 👍

jacktang avatar Jul 15 '24 08:07 jacktang

Hi @jacktang,

Can we use a Python Library, such as numpyro/pyro, for this, which simplifies and provides a pre-packaged version of the No U-Turn Sampling required in SAASBO?

MandaKausthubh avatar May 30 '25 10:05 MandaKausthubh

Hello @MandaKausthubh, actually I used numpyro for inner saasgp class. now i am stuck in solving the high dimensional constrained problem, and plan to submit the code in next two or three weeks.

jacktang avatar Jun 02 '25 01:06 jacktang

Hi everyone!

thanks for the interest in making this happen. I'm a bit worried about the added complexity of having another GP model just to add SAASBO, I was hoping we could just use the sklearn implementation. Let me think about the best way to move forward.

till-m avatar Jun 02 '25 07:06 till-m

Hello @till-m, SAASBO may bring some changes : 1) sample based GP (SAASGP) rather than analytical GP (standard GP from sklearn) 2) Sample-based acquisition functions.

jacktang avatar Jun 02 '25 11:06 jacktang

Hi everyone,

I did some quick research and there have been two papers in the last two years which discuss whether BO is really performing poorly in high dimensions.

  • Vanilla Bayesian Optimization Performs Great in High Dimensions, ICML 2024, https://proceedings.mlr.press/v235/hvarfner24a.html
  • Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization, ICLR 2025, https://openreview.net/forum?id=kX8h23UG6v

I'm thinking it might make more sense to first implement some of the results in the second paper, which suggest using a matern kernel, making it anisotropic and also emphasizes an initialization of the lengthscale parameter with $l_i = c \sqrt{d}$ ($c \approx 1$).

till-m avatar Jun 04 '25 13:06 till-m

Interesting! I'd like to spend some time to read the second paper and the code

jacktang avatar Jun 05 '25 01:06 jacktang

I organized the HDBO implementation which mentioned in paper "Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization" here. Check it if you are interested in it. And try to implement the idea by sklearn GP in next weeks.

jacktang avatar Jun 14 '25 09:06 jacktang

Hey everyone!

Thanks for your continued interest in this problem and the will to contribute and make this package better!!

@jacktang thanks for writing the HDBO code. I'm unfortunately a bit busy right now and can't really have a look at it. Are you planning to replicate the experiments of the paper with it?

@MandaKausthubh thanks for drafting the PR! I think for now, I don't think the added complexity is worth it, especially given the papers above.

For now, with regards to High-dim optimization I propose the following: Let's try and make it easier for people to use the results of the "Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization" paper, while seeing how the literature develops. For this we need to add:

  • feature scaling to [0, 1] (which is an assumption made by the paper) -- I've drafted code for this already and can push it soon
  • a simpler way of initializing the kernel anisotropically with their recommended length scale setting -- though maybe this is not needed if we do a good job with the next point:
  • A documentation notebook

Even if the research in this direction moves towards another approach, these changes are small in overhead and probably useful in any case (the documentation we could just delete).

What do you think?

till-m avatar Jun 18 '25 09:06 till-m

Hello @till-m , I had interests to learn the paper, and now only implemented the recommended HDBO by gpytorch and tested it with the high dimensional problems . It's great you've drafted the code and ready to push :). I'd like to test the implementation especially for the constrained HDBO if needed.

jacktang avatar Jun 18 '25 10:06 jacktang

@jacktang for the feature scaling see this branch of my fork (especially also the diff to master). This does implement a uniform length scale as of now, the anisotropic can be set via set_gp_params.

till-m avatar Jun 23 '25 14:06 till-m

Hello @till-m , I add Hartmann6 problem for HDBO, the conclusion is current sklearn GP can't solve the optimization problem. I guess feature scaling and length scale are some part of the key to solving the problem, but may be not the all(?)

jacktang avatar Jun 28 '25 08:06 jacktang

Hey @jacktang,

great that you worked on the implementation! I had a quick look at your code and I notice you didn't set the kernel to be anisotropic (what they call "ARD" in the paper). You can do that by adding this line before running .maximize

from sklearn.gaussian_process.kernels import Matern
optimizer.set_gp_params(
    kernel=Matern(nu=2.5, length_scale=np.ones(dim) * np.sqrt(dim)),
)

NB: This will significantly increase the runtime, and I still didn't get close to the score you mentioned in the markdown cell 🤔

till-m avatar Jul 02 '25 07:07 till-m

Hello @till-m , I also tried to implement it by using sklearn GP and did not get the great result. I also compare both GP regressors, and the modern GP which using gradient descent with Adam/RMSprop may be more robust?

jacktang avatar Jul 29 '25 12:07 jacktang