bayes-skopt
bayes-skopt copied to clipboard
Allow optimization based acquisition functions
Computation of acquisition functions only on sampled points is problematic in high-dimensional spaces, where the distance to the true optimum (of the acquisition function) will be large on average.
Hi @kiudee, The package looks quite good! I'm particularly interested in Thompson Sampling, however my problem is high dimensional so I need to use the approximation from Hernández-Lobato 2014. Did you have a look into it already? Cheers
Hi @michaelosthege, thank you for the reference. One of my near future todos was to support parallel evaluation of configurations, which are not simply implemented using a constant liar strategy. Thompson Sampling already seemed like the most natural acquisition function for parallelization.
From having a cursory glance at the paper, it looks like it should be straightforward to implement.
The paper was accompanied by a MATLAB implementation for which I also found a Python port.
Unfortunately, due to the code formatting & absence of useful docstrings/comments I found it hard to understand. I have trouble understanding which values I'll have to pass & which constraints this approximation may put on my GP model..
In your case I would look at quadrature fourier features to implement the Thompson sampling part: https://papers.nips.cc/paper/8115-efficient-high-dimensional-bayesian-optimization-with-additivity-and-quadrature-fourier-features I found an implementation here: https://github.com/Mojusko/QFF
Use it in conjunction with predictive variance reduction search (PVRS): https://bayesopt.github.io/papers/2017/13.pdf which works similar to PES but is faster and simpler to implement.
edit: In bayes-skopt PVRS is implemented as follows: https://github.com/kiudee/bayes-skopt/blob/d7fa3d2908935ac812404a5e76018c55598e9380/bask/acquisition.py#L278-L308 Note, however that this is sampling from a random set of points instead of using fourier features.
hi @kiudee, Thank you for those links. It took me a while to read them and also Bradford, 2018 which has a really good description of the RFF algorithm.
My impression is that the QFF make some assumptions about independence of the problem dimensions, that we don't feel comfortable about with our problem. (That might change though, if we run into problems with computational complexity 😅)
My RFF implementation still has a bug, resulting in too much posterior variance, but I think it's just a **2
or numpy.log
somewhere..