BayesianOptimization icon indicating copy to clipboard operation
BayesianOptimization copied to clipboard

Better sampling of the random exploration step using Latin hypercube

Open MariusCautun opened this issue 3 years ago • 2 comments

Choosing the initial exploration points as independent random variables is not optimal for achieving high coverage of the parameter space. One better solution is using a Latin hypercube.

This could be implemented in a few lines using the scipy module: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.qmc.LatinHypercube.html

If there is interest, I am glad to implement the change.

For an example of what a Latin hypercube is and how it compares with other methods of sampling the parameter space see for example: https://scikit-optimize.github.io/stable/auto_examples/sampler/initial-sampling-method.html

MariusCautun avatar Feb 27 '22 15:02 MariusCautun

Interesting point. It's probably important to note given your two suggestions that pull requests are currently taking a long time to be approved because the owner of this code has been very busy. I have been in touch with them and they hope to rectify this situation soon, but you should be aware of this before putting in too much work - although as you say this one would appear a small effort.

At least off the top of my head and based on what I know, this sounds like a good idea.

bwheelz36 avatar Feb 28 '22 00:02 bwheelz36

Hey @MariusCautun, I can now accept pull requests if you are still interested in working on this feature. I would like to see some basic tests that it performs at least as well as the random sampling before accepting.

bwheelz36 avatar Jun 10 '22 00:06 bwheelz36

@bwheelz36 how would such a test look like?

till-m avatar May 15 '23 07:05 till-m

I was thinking basically to choose one or more 'black box function' and demonstrate that for the same number of iterations, the latin hypercube performs at least as well as current at finding the max. I don't think this needs to be in the formal test suite, but I think we should see this demonstrated before accepting such a change?

bwheelz36 avatar May 15 '23 21:05 bwheelz36

I have looked into it and finally got around to posting. In short, does not seem to be worth it.

2-dim space

here's how I changed the relevant code in acq_max:

    # Explore the parameter space more thoroughly
    LH_sampling = False
    if LH_sampling:
        print('lh sampling')
        sampler = LatinHypercube(d=bounds.shape[0], seed=random_state)
        x_seeds = sampler.random(n=n_iter)
        x_seeds = x_seeds * (bounds[:, 1] - bounds[:, 0]) + bounds[:, 0]
    else:
        print('uniform sampling')
        x_seeds = random_state.uniform(bounds[:, 0], bounds[:, 1],
                                    size=(n_iter, bounds.shape[0]))

here's my test code:

import numpy as np
from bayes_opt import BayesianOptimization

def negative_hoelder_table_function(x, y):
    return np.abs(np.sin(x) * np.cos(y) * np.exp(np.abs(1-np.sqrt(x**2 + y**2)/np.pi)))

negative_hoelder_table_bounds = {'x': (-10., 10.), 'y': (-10., 10.)}

def mccormick_function(x, y):
    return -1*(np.sin(x+y) + (x-y)**2 - 1.5*x + 2.5*y + 1.)

mccormick_bounds = {'x': (-1.5, 4.), 'y': (-3., 4.)}

max_ = []
for lhc_sampling in [False, True]:
    if lhc_sampling:
        print('lhc sampling')
    else:
        print('uniform sampling')
    for i in range(100):
        optimizer = BayesianOptimization(
            f=mccormick_function,
            pbounds=mccormick_bounds,
            random_state=i,
            verbose=0
        )

        optimizer.maximize(
            init_points=10,
            n_iter=10,
            lhc_sampling=lhc_sampling
        )
        max_.append(optimizer.max['target'])
    print(np.mean(max_))
    print(np.std(max_))

mccormick:

uniform sampling
1.8569479873550738
0.05623310739768076
lhc sampling
1.8571926312658693
0.05588059333655116

Hölder Table:

uniform sampling
14.406536168610987
4.416574931927951
lhc sampling
14.355603745105523
4.420430230640025

high-dim space

I figured the difference might only become relevant for high-dimensional problems so I looked at how it behaves in this case.

import numpy as np
from bayes_opt import BayesianOptimization
import matplotlib.pyplot as plt

def make_rastrigin_function_and_bounds(n=10):
    def func(**xs):
        return -1*(10*n + sum([xs[x_key]**2 - 10* np.cos(2*np.pi*xs[x_key]) for x_key in xs]))

    pbounds = {str(i): (-5.12, 5.12) for i in range(n)}
    return func, pbounds

def make_rosenbrock_function_and_bounds(n=10):
    def func(**xs):
        xs = np.array(list(xs.values()))
        return -1*(np.sum(xs[1:]-xs[:-1]**2 + (1-xs[1:])**2))
    
    pbounds ={str(i): (-5, 5) for i in range(n)}
    return func, pbounds

max_uniform = []
max_lhc = []
ns = [2, 4, 8, 16]

for n in ns:
    print(n)
    func, pbounds = make_rastrigin_function_and_bounds(n=n)
    for lhc_sampling in [False, True]:
        max_ = []
        for i in range(10):
            optimizer = BayesianOptimization(
                f=func,
                pbounds=pbounds,
                random_state=i,
                verbose=0
            )

            optimizer.maximize(
                init_points=10,
                n_iter=10,
                lhc_sampling=lhc_sampling
            )
            max_.append(optimizer.max['target'])
        if lhc_sampling:
            max_lhc.append(max_)
        else:
            max_uniform.append(max_)

for mode, data in zip(['uniform', 'lhc'], (max_uniform, max_lhc)):
    data = np.array(data)
    plt.errorbar(ns, np.mean(data, axis=-1), np.std(data, axis=-1), fmt='.', label=mode, capsize=2.)

plt.xscale('log', base=2)
plt.grid(True)
plt.legend(loc='best')
plt.show()

I tried this for n_warmup=10000 (the default), n_warmup=1000 and 100. The results are not very convicing.

10'000

Rastrigin:

rastrigin

Rosenbrock:

rosenbrock

1000

Rastrigin:

rastrigin_1000

Rosenbrock:

rosenbrock_1000

100

Rastrigin:

rastrigin_100

Rosenbrock:

rosenbrock_100

Summary

I suspect that acq_max has simply too many n_warmups compared to dimensionality for LHC to be relevant. In the high-dimensional case where it would be relevant, you're probably not going to use BO anyways.

till-m avatar May 22 '23 11:05 till-m

Wow this is great work @till-m! I think then that we can close this for now...

bwheelz36 avatar May 23 '23 00:05 bwheelz36