Bayesian-Optimization icon indicating copy to clipboard operation
Bayesian-Optimization copied to clipboard

Boolean variables not supported in combination with other categoricals

Open Dvermetten opened this issue 4 years ago • 2 comments

Describe the bug When the search space contains a boolean variable in addition to another categorical variable which is non-boolean, the search will fail.

To Reproduce The following is a slight modification of an existing test to show the problem `dim_r = 2 # dimension of the real values def obj_fun(x): x_r = np.array([x['continuous_%d'%i] for i in range(dim_r)]) x_i = x['ordinal'] x_d = x['nominal'] _ = 0 if x_d == 'OK' else 1 return np.sum(x_r ** 2) + abs(x_i - 10) / 123. + _ * 2

search_space = ContinuousSpace([-5, 5], var_name='continuous') * dim_r +
OrdinalSpace([5, 15], var_name='ordinal') +
NominalSpace(['OK', 'A', None], var_name='nominal') +
NominalSpace([True, False], var_name='boolvar')

model = RandomForest(levels=search_space.levels)

opt = ParallelBO( search_space=search_space, obj_fun=obj_fun, model=model, max_FEs=6, DoE_size=3, # the initial DoE size eval_type='dict', acquisition_fun='MGFI', acquisition_par={'t' : 2}, n_job=3, # number of processes n_point=3, # number of the candidate solution proposed in each iteration verbose=False # turn this off, if you prefer no output ) xopt, fopt, stop_dict = opt.run()`

Expected behavior This should perform exactly the same as the case without boolean variable

Additional context It seems to be related to the checking of input in the random forest

Dvermetten avatar Oct 21 '20 13:10 Dvermetten

Somewhere in the random forest these values are converted to strings, which is leading to this issue. I haven't yet found exactly where this issue occurs, but it is not just limited to boolean variable, but any nominal space where the options are not strings originally share this problem.

Dvermetten avatar Nov 23 '20 15:11 Dvermetten

@Dvermetten I actually have a similar issue (maybe the same). I used pip to install the latest version and I am getting errors of the form: ValueError: Found unknown categories ['22', '18', '26', '96'] in column 0 during transform. I am guessing it's what you said above because I have a specific nominal range the elements of which are not strings originally: max_depth = NominalSpace([None] + np.arange(2,102,2).tolist())

Is there an update/fix on this? I also think the pip version is an older version than what is now on master...

Thanks!

MariosKef avatar Jun 21 '21 20:06 MariosKef