BayesianOptimization icon indicating copy to clipboard operation
BayesianOptimization copied to clipboard

Internal sorting of parameter keys leads to unexpected behaviour when registering an array

Open till-m opened this issue 1 year ago • 14 comments

Describe the bug Internally, the target space sorts the keys of the pbounds dict before creating the target space bounds. This is presumably because python dicts were unordered before python 3.7 and ensured that there was a canonical order to the parameters. However, after python 3.7 dict keys are now ordered, making this sorting unnecessary. Moreover, the fact that the internal representation is changed can lead to unexpected behaviour when registering a numpy array (see example below). This seems very likely to happen when using a DataFrame representation of data.

To Reproduce

Ex:

from sklearn import datasets
import pandas as pd
from bayes_opt import BayesianOptimization

df = pd.DataFrame(datasets.load_iris().data, columns=datasets.load_iris().feature_names)
df.columns = [col.replace(' (cm)', '') for col in df.columns]
df.columns = [col.replace(' ', '_') for col in df.columns]

pbounds = {
    'sepal_length': (df['sepal_length'].min(), df['sepal_length'].max()),
    'sepal_width': (df['sepal_width'].min(), df['sepal_width'].max()),
    'petal_length': (df['petal_length'].min(), df['petal_length'].max()),
    'petal_width': (df['petal_width'].min(), df['petal_width'].max())
}
optimizer = BayesianOptimization(
    f=None,
    pbounds=pbounds,
    random_state=1,
)
optimizer.register(df.iloc[0].values, 0)
print(dict(optimizer.space.array_to_params(optimizer.space.params[0])))
print(dict(df.iloc[0]))

Output:

{'petal_length': np.float64(5.1), 'petal_width': np.float64(3.5), 'sepal_length': np.float64(1.4), 'sepal_width': np.float64(0.2)}
{'sepal_length': np.float64(5.1), 'sepal_width': np.float64(3.5), 'petal_length': np.float64(1.4), 'petal_width': np.float64(0.2)}

Expected behavior The optimizer should not change the order of the keys provided by the user. Registering a numpy array instead of a dict should assume the same order as given by the pbounds.

till-m avatar Oct 22 '24 08:10 till-m

Hello, I believe this bug is affecting my usage of the package. I have the following instance of the Optimisation class and after the first iteration is seems to swap the second and third bounds, so the parameter 'thick' is meant to have bound of 0.1e-6 to 15.0e-6 and 'g' is meant to have 1.0 to 2.0 but they get swapped.

# Simulation Parameters min and max values
param_Min = [1.0, 0.1 * micron, 1.0]
param_Max = [20.0, 15.0 * micron, 2.0]

# Bayesian Optimization Class, random_state is the seed for the random number generator
acq = acquisition.UpperConfidenceBound(kappa=5.0)

optimizer = BayesianOptimization(
    f=None,
    acquisition_function=acq,
    pbounds={'den' : (param_Min[0], param_Max[0]),
             'thick' : (param_Min[1], param_Max[1]),
             'g' : (param_Min[2], param_Max[2])
             },
    verbose=2,
    random_state=1,
)

Nkehoe-QUB avatar Oct 25 '24 10:10 Nkehoe-QUB

Hey @Nkehoe-QUB, if you give me a full example that reproduces the error I can have a look and see if this is really what happens.

till-m avatar Oct 25 '24 11:10 till-m

Hi, I'm sorry it turns out it's an issue how I'm handing the output of optimiser.suggest() and passing it to my simulation. I was doing:

next_points = optimizer.suggest()
    ID = str(Run_PIC(i, j, [values for values in next_points.values()]))

But I didn't realise that the keys of the suggest() output would be in alphabetical order so therefore it was passing it as den g and then thick.

Nkehoe-QUB avatar Oct 25 '24 12:10 Nkehoe-QUB

I'm glad this has been noted. I agree that this is a wrinkle worth smoothing out. If nothing else, better compatibility with pandas seems worthwhile. I'll take a look and report back.

perezed00 avatar Nov 14 '24 20:11 perezed00

I do remember running into this and being tripped up by it. (example below :stuck_out_tongue_closed_eyes:)

I agree it would be better if this didn't happen, but note that this would be a breaking change for many users of this library, who will be assuming that this library is sorting the parameters. so fixing this should be deferred until the next major release is planned in my opinion.

image

bwheelz36 avatar Nov 16 '24 08:11 bwheelz36

This should only affect people that register arrays, which hopefully is a small part of the userbase. One thing we could do is publish a minor version that raises a warning when someone registers an array, telling them that this operation will work differently in a future release.

till-m avatar Nov 17 '24 14:11 till-m

This change added confusing warnings even when executing the example code given on the web page, e.g. on https://bayesian-optimization.github.io/BayesianOptimization/2.0.1/visualization.html. Would it be possible to add a short hint about that also there, as it might be confusing for future users who want to follow the tutorial, but receive the warning?

anates avatar Dec 18 '24 08:12 anates

Hey @anates,

Thanks for reporting. The warning should definitely not show up in this context, this is due to an error on my part. Let me think about the best way to handle the issue.

For the record: This happens because space.probe actually converts to an array before registering, I thought it would just pass the point as received. So now running the maximize loop will actually trigger the warning, which really shouldn't happen and is understandably extremely confusing for any user.

Apologies for the confusion & hi from Basel :)

till-m avatar Dec 18 '24 09:12 till-m

@till-m For my understanding: Is there currently any way to avoid rotations/exchanges of the keys in pbounds? Or what is the best way to approach that problem until a further fix? Thank you, and regards back from Thun!

anates avatar Dec 18 '24 09:12 anates

Hey @anates,

how are you using the library currently?

  • If you just run .maximize -- no need to worry. The warning gets triggered unnecessarily.
  • If you use suggest-evaluate-register, but you register points as dictionaries (.suggest returns a dictionary, so unless you manually transform the point, this will be the case), you don't need to worry either. The warning gets triggered unnecessarily.
  • If you register a point in an array-format at some point (i.e using either .register or .probe manually, with an array-valued argument), then this warning would be for you.

The plan is to fix the erroneously displayed warnings. The fact that the order is manipulated by the optimizer is something we will change in an upcoming release.

What is the problem?

Consider the case where you're trying to optimize a function


def foobar(baz, qux, quux):
    return something(baz, qux, quux)

and you have evaluated the function at foobar(baz=0., qux=1., quux=2.)=42. If you register this information with the optimizer, you can do it in two ways:


optimizer.register({'baz': 0., 'qux': 1., 'quux': 2.}, target=42) # register a dict
optimizer.register(np.array([0., 1., 2.]), target=42) # register an array

The optimizer assumes that if you register an array, the entries correspond to the alphabetically ordered parameters, i.e. it would consider the second option to be equivalent to optimizer.register({'baz': 0., 'quux': 1., 'qux': 2.}, target=42). So if you want to register an array, you need to ensure that the entries in the array are in alphabetical order of parameter names. Or, much simpler, just convert it to a dictionary and don't worry about it.

I hope this helps. Let me know if there are any other questions.

till-m avatar Dec 18 '24 09:12 till-m

Hei @till-m, currently I'm just using .maximize(), thus I was confused how to fix that warning. Thank you for the explanation!

anates avatar Dec 18 '24 10:12 anates

@anates if you install from PyPI, an updated version of the package should be available that doesn't produce the warnings. On conda it might take some time for the release to show up.

till-m avatar Dec 18 '24 11:12 till-m

@till-m Perfect, thank you very much for the quick reply and fix!

anates avatar Dec 18 '24 12:12 anates

As an update, the behaviour has now (v3.0.0b+) changed to use the provided order and the warning is inverted. I'm keeping this issue open for now and will close it once we remove the warning :)

till-m avatar Jun 03 '25 16:06 till-m