BayesianOptimization icon indicating copy to clipboard operation
BayesianOptimization copied to clipboard

Support different data types for optimization parameters

Open Engineero opened this issue 7 years ago • 15 comments

It would be nice to support different data types---e.g. int, float, bool, and maybe a categorical string---for the parameters over which we optimize. I am not sure what the syntax would look like, except for maybe a list of datatypes passed in that corresponds to the parameter bounds.

All three of these types could be handled the same way, with int being drawn uniformly from the integer interval specified, bool being drawn uniformly from {0, 1}, and categorical strings being mapped to a drawing from integer values [0, 1, ..., n_categories-1] or one-hot encoded as @PedroCardoso suggested below.

See [E. C. Garrido-Merchan and D. Hernandez-Lobato, 2017] for one approach.

Engineero avatar May 10 '18 13:05 Engineero

Interesting, the kernel change they propose wouldn't be too hard to implement. My only concern is making the API more and more cumbersome by piling features. However this one is requested often enough to be worth considering.

fmfn avatar May 18 '18 16:05 fmfn

I can try to take a look at it too. I'll let you know if I get anywhere.

Engineero avatar May 18 '18 16:05 Engineero

Could I propose a different approach to categorical strings data types ? I would suggest a one-hot implementation, in practice creating n bool dimensions on the search space. categorical types are independent.

PedroCardoso avatar Jul 20 '18 15:07 PedroCardoso

Did anyone advanced on this ?

PedroCardoso avatar Jul 20 '18 16:07 PedroCardoso

It would be really useful to have these types supported.

dehdari avatar Aug 06 '18 23:08 dehdari

+1 I'd like this too!

guidocalvano avatar Nov 27 '18 15:11 guidocalvano

+1 I'd like this too!

dingtine avatar Dec 12 '18 03:12 dingtine

I proposed an implementation for the type integer in a merge request. It is more a first shot than a terminated work, improvements can be done. Could anyone take a look to discuss on this please?

jmehault avatar Jan 11 '19 17:01 jmehault

Great suggestion! Parameter typing would be really useful, specially for categorical parameters.

gustavolvieira avatar Jan 23 '19 18:01 gustavolvieira

+1 I would like to use integers

janwendt avatar Mar 22 '19 15:03 janwendt

Is it possible to exclude specific points in the bounds? I mean, when defining BayesianOptimization(f=black_box_function, pbounds={'e': (0, 1)}) I do not actually want 'e' to be 0. Surely I can write pbounds={'e': (0.0001, 1)} or something like that but it is not nice.

Another thing, it sometime "gets stuck" on points (iter 10-16) which seems a waste:

| iter | target | e |

| 1 | 0.7492 | 0.2963 | | 2 | 0.03762 | 0.7072 | | 3 | 0.6771 | 0.2084 | | 4 | 0.4013 | 0.4408 | | 5 | 0.03448 | 0.9871 | | 6 | 0.3762 | 0.001 | | 7 | 0.7429 | 0.2671 | | 8 | 0.7461 | 0.287 | | 9 | 0.721 | 0.3317 | | 10 | 0.7492 | 0.2928 | | 11 | 0.7492 | 0.2927 | | 12 | 0.7492 | 0.2925 | | 13 | 0.7492 | 0.2928 | | 14 | 0.7492 | 0.2917 | | 15 | 0.7492 | 0.291 | | 16 | 0.7492 | 0.2945 | | 17 | 0.7461 | 0.2891 | | 18 | 0.7367 | 0.3024 | | 19 | 0.03448 | 0.8466 | | 20 | 0.05016 | 0.5746 |

Is it intentional?

friedsela avatar Apr 16 '19 08:04 friedsela

There's no way to exclude boundary points. I don't think the extra complexity would justify. As you mentioned you can simply use (1e-4, 1), or something like it, since the choice of lower bound should be immaterial. If, however, you believe the difference between picking 1e-5 or 1e-3 as a lower bound is important, you should transform this variable to a log scale.

And in this particular example the optimizer was not stuck, it was simply exploiting the maximum region around 0.292....

fmfn avatar Apr 17 '19 14:04 fmfn

Thank you.

friedsela avatar Apr 17 '19 14:04 friedsela

+1 any updates? thanks.

atisman89 avatar Dec 17 '19 02:12 atisman89

I'm looking to use this for booleans, any updates?

marcelroed avatar Apr 28 '21 09:04 marcelroed