Better sampling of the random exploration step using Latin hypercube
Choosing the initial exploration points as independent random variables is not optimal for achieving high coverage of the parameter space. One better solution is using a Latin hypercube.
This could be implemented in a few lines using the scipy module: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.qmc.LatinHypercube.html
If there is interest, I am glad to implement the change.
For an example of what a Latin hypercube is and how it compares with other methods of sampling the parameter space see for example: https://scikit-optimize.github.io/stable/auto_examples/sampler/initial-sampling-method.html
Interesting point. It's probably important to note given your two suggestions that pull requests are currently taking a long time to be approved because the owner of this code has been very busy. I have been in touch with them and they hope to rectify this situation soon, but you should be aware of this before putting in too much work - although as you say this one would appear a small effort.
At least off the top of my head and based on what I know, this sounds like a good idea.
Hey @MariusCautun, I can now accept pull requests if you are still interested in working on this feature. I would like to see some basic tests that it performs at least as well as the random sampling before accepting.
@bwheelz36 how would such a test look like?
I was thinking basically to choose one or more 'black box function' and demonstrate that for the same number of iterations, the latin hypercube performs at least as well as current at finding the max. I don't think this needs to be in the formal test suite, but I think we should see this demonstrated before accepting such a change?
I have looked into it and finally got around to posting. In short, does not seem to be worth it.
2-dim space
here's how I changed the relevant code in acq_max:
# Explore the parameter space more thoroughly
LH_sampling = False
if LH_sampling:
print('lh sampling')
sampler = LatinHypercube(d=bounds.shape[0], seed=random_state)
x_seeds = sampler.random(n=n_iter)
x_seeds = x_seeds * (bounds[:, 1] - bounds[:, 0]) + bounds[:, 0]
else:
print('uniform sampling')
x_seeds = random_state.uniform(bounds[:, 0], bounds[:, 1],
size=(n_iter, bounds.shape[0]))
here's my test code:
import numpy as np
from bayes_opt import BayesianOptimization
def negative_hoelder_table_function(x, y):
return np.abs(np.sin(x) * np.cos(y) * np.exp(np.abs(1-np.sqrt(x**2 + y**2)/np.pi)))
negative_hoelder_table_bounds = {'x': (-10., 10.), 'y': (-10., 10.)}
def mccormick_function(x, y):
return -1*(np.sin(x+y) + (x-y)**2 - 1.5*x + 2.5*y + 1.)
mccormick_bounds = {'x': (-1.5, 4.), 'y': (-3., 4.)}
max_ = []
for lhc_sampling in [False, True]:
if lhc_sampling:
print('lhc sampling')
else:
print('uniform sampling')
for i in range(100):
optimizer = BayesianOptimization(
f=mccormick_function,
pbounds=mccormick_bounds,
random_state=i,
verbose=0
)
optimizer.maximize(
init_points=10,
n_iter=10,
lhc_sampling=lhc_sampling
)
max_.append(optimizer.max['target'])
print(np.mean(max_))
print(np.std(max_))
mccormick:
uniform sampling
1.8569479873550738
0.05623310739768076
lhc sampling
1.8571926312658693
0.05588059333655116
Hölder Table:
uniform sampling
14.406536168610987
4.416574931927951
lhc sampling
14.355603745105523
4.420430230640025
high-dim space
I figured the difference might only become relevant for high-dimensional problems so I looked at how it behaves in this case.
import numpy as np
from bayes_opt import BayesianOptimization
import matplotlib.pyplot as plt
def make_rastrigin_function_and_bounds(n=10):
def func(**xs):
return -1*(10*n + sum([xs[x_key]**2 - 10* np.cos(2*np.pi*xs[x_key]) for x_key in xs]))
pbounds = {str(i): (-5.12, 5.12) for i in range(n)}
return func, pbounds
def make_rosenbrock_function_and_bounds(n=10):
def func(**xs):
xs = np.array(list(xs.values()))
return -1*(np.sum(xs[1:]-xs[:-1]**2 + (1-xs[1:])**2))
pbounds ={str(i): (-5, 5) for i in range(n)}
return func, pbounds
max_uniform = []
max_lhc = []
ns = [2, 4, 8, 16]
for n in ns:
print(n)
func, pbounds = make_rastrigin_function_and_bounds(n=n)
for lhc_sampling in [False, True]:
max_ = []
for i in range(10):
optimizer = BayesianOptimization(
f=func,
pbounds=pbounds,
random_state=i,
verbose=0
)
optimizer.maximize(
init_points=10,
n_iter=10,
lhc_sampling=lhc_sampling
)
max_.append(optimizer.max['target'])
if lhc_sampling:
max_lhc.append(max_)
else:
max_uniform.append(max_)
for mode, data in zip(['uniform', 'lhc'], (max_uniform, max_lhc)):
data = np.array(data)
plt.errorbar(ns, np.mean(data, axis=-1), np.std(data, axis=-1), fmt='.', label=mode, capsize=2.)
plt.xscale('log', base=2)
plt.grid(True)
plt.legend(loc='best')
plt.show()
I tried this for n_warmup=10000 (the default), n_warmup=1000 and 100. The results are not very convicing.
10'000
Rastrigin:
Rosenbrock:
1000
Rastrigin:
Rosenbrock:
100
Rastrigin:
Rosenbrock:
Summary
I suspect that acq_max has simply too many n_warmups compared to dimensionality for LHC to be relevant. In the high-dimensional case where it would be relevant, you're probably not going to use BO anyways.
Wow this is great work @till-m! I think then that we can close this for now...