sherpa icon indicating copy to clipboard operation
sherpa copied to clipboard

Fixed inconsistency for Discrete parameter in Bayesian optimization

Open jbboin opened this issue 4 years ago • 2 comments

When using a Discrete parameter with the range [a, b] in RandomSearch, the values are from a (included) to b (excluded). However the same parameter used in GpyOpt will result in values from a (included) to b (included). This PR fixed this inconsistency by modifying the behavior in GpyOpt. It also adds a test that ensures that the produced values in a toy example are in the correct range.

The steps to reproduce this issue are as follows (we also look at the values obtained with RandomSearch for reference):

import sherpa

def create_study(algorithm):
    parameters = [
        sherpa.Discrete('param', [0, 1]),
    ]
    return sherpa.Study(
        parameters=parameters,
        algorithm=algorithm,
        lower_is_better=True,
        disable_dashboard=True,
    )

print('Random Search')
values = set()
algorithm = sherpa.algorithms.RandomSearch(max_num_trials=10)
study = create_study(algorithm)
for trial in study:
    study.add_observation(trial, iteration=0, objective=0)
    study.finalize(trial)
    values.add(trial.parameters['param'])
assert values == set([0]), f'Values found: {values}. Expected: {set([0])}'

print('Bayesian Optimization')
values = set()
algorithm = sherpa.algorithms.GPyOpt(max_num_trials=10, verbosity=True)
study = create_study(algorithm)
for trial in study:
    study.add_observation(trial, iteration=0, objective=0)
    study.finalize(trial)
    values.add(trial.parameters['param'])
assert values == set([0]), f'Values found: {values}. Expected: {set([0])}'

jbboin avatar Oct 16 '20 23:10 jbboin

Thanks for the fix! I'm wondering if it would be more intuitive to have Discrete be values from a (included) to b (included) and update RandomSearch

LarsHH avatar Oct 18 '20 08:10 LarsHH

Yeah honestly I don't have a strong opinion either way. I initially thought that BO was the one breaking the pattern so it would be easier to change it, but now I'm looking at GridSearch and I realize that it also includes the upper bound of the range. So yeah, there's definitely some consistency issue we need to resolve here. The convention we end up settling on is up to you, but I'm starting to lean towards including the upper bound as well. If that's the convention choice we decide on, would modifying RandomSearch be enough?

jbboin avatar Oct 19 '20 16:10 jbboin