Support high-dimensional BO
Is your feature request related to a problem? Please describe. Bayesian optimization for high-dimensional constraint problem
Describe the solution you'd like The solution called Sparse Axis-Aligned Subspace BO (SAASBO) is described in the paper: https://arxiv.org/abs/2103.00349
References or alternative approaches
- https://arxiv.org/abs/2103.00349
Are you able and willing to implement this feature yourself and open a pull request?
- [x] Yes, I would like to try to implement it
Hi @jacktang,
I haven't had the time to read this paper, but considering that this is essentially an acquisition strategy, do you think you could implement this simply by adding a new acquisition function?
Hello @till-m
Yes, I am going to implement it the way you suggested. And I noticed that you had merge the acquisition function branch to master. Great job! 👍
Hi @jacktang,
Can we use a Python Library, such as numpyro/pyro, for this, which simplifies and provides a pre-packaged version of the No U-Turn Sampling required in SAASBO?
Hello @MandaKausthubh, actually I used numpyro for inner saasgp class. now i am stuck in solving the high dimensional constrained problem, and plan to submit the code in next two or three weeks.
Hi everyone!
thanks for the interest in making this happen. I'm a bit worried about the added complexity of having another GP model just to add SAASBO, I was hoping we could just use the sklearn implementation. Let me think about the best way to move forward.
Hello @till-m, SAASBO may bring some changes : 1) sample based GP (SAASGP) rather than analytical GP (standard GP from sklearn) 2) Sample-based acquisition functions.
Hi everyone,
I did some quick research and there have been two papers in the last two years which discuss whether BO is really performing poorly in high dimensions.
- Vanilla Bayesian Optimization Performs Great in High Dimensions, ICML 2024, https://proceedings.mlr.press/v235/hvarfner24a.html
- Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization, ICLR 2025, https://openreview.net/forum?id=kX8h23UG6v
I'm thinking it might make more sense to first implement some of the results in the second paper, which suggest using a matern kernel, making it anisotropic and also emphasizes an initialization of the lengthscale parameter with $l_i = c \sqrt{d}$ ($c \approx 1$).
Interesting! I'd like to spend some time to read the second paper and the code
I organized the HDBO implementation which mentioned in paper "Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization" here. Check it if you are interested in it. And try to implement the idea by sklearn GP in next weeks.
Hey everyone!
Thanks for your continued interest in this problem and the will to contribute and make this package better!!
@jacktang thanks for writing the HDBO code. I'm unfortunately a bit busy right now and can't really have a look at it. Are you planning to replicate the experiments of the paper with it?
@MandaKausthubh thanks for drafting the PR! I think for now, I don't think the added complexity is worth it, especially given the papers above.
For now, with regards to High-dim optimization I propose the following: Let's try and make it easier for people to use the results of the "Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization" paper, while seeing how the literature develops. For this we need to add:
- feature scaling to [0, 1] (which is an assumption made by the paper) -- I've drafted code for this already and can push it soon
- a simpler way of initializing the kernel anisotropically with their recommended length scale setting -- though maybe this is not needed if we do a good job with the next point:
- A documentation notebook
Even if the research in this direction moves towards another approach, these changes are small in overhead and probably useful in any case (the documentation we could just delete).
What do you think?
Hello @till-m , I had interests to learn the paper, and now only implemented the recommended HDBO by gpytorch and tested it with the high dimensional problems . It's great you've drafted the code and ready to push :). I'd like to test the implementation especially for the constrained HDBO if needed.
@jacktang for the feature scaling see this branch of my fork (especially also the diff to master). This does implement a uniform length scale as of now, the anisotropic can be set via set_gp_params.
Hello @till-m , I add Hartmann6 problem for HDBO, the conclusion is current sklearn GP can't solve the optimization problem. I guess feature scaling and length scale are some part of the key to solving the problem, but may be not the all(?)
Hey @jacktang,
great that you worked on the implementation! I had a quick look at your code and I notice you didn't set the kernel to be anisotropic (what they call "ARD" in the paper). You can do that by adding this line before running .maximize
from sklearn.gaussian_process.kernels import Matern
optimizer.set_gp_params(
kernel=Matern(nu=2.5, length_scale=np.ones(dim) * np.sqrt(dim)),
)
NB: This will significantly increase the runtime, and I still didn't get close to the score you mentioned in the markdown cell 🤔
Hello @till-m , I also tried to implement it by using sklearn GP and did not get the great result. I also compare both GP regressors, and the modern GP which using gradient descent with Adam/RMSprop may be more robust?