BayesianOptimization
BayesianOptimization copied to clipboard
Support different data types for optimization parameters
It would be nice to support different data types---e.g. int, float, bool, and maybe a categorical string---for the parameters over which we optimize. I am not sure what the syntax would look like, except for maybe a list of datatypes passed in that corresponds to the parameter bounds.
All three of these types could be handled the same way, with int being drawn uniformly from the integer interval specified, bool being drawn uniformly from {0, 1}, and categorical strings being mapped to a drawing from integer values [0, 1, ..., n_categories-1] or one-hot encoded as @PedroCardoso suggested below.
See [E. C. Garrido-Merchan and D. Hernandez-Lobato, 2017] for one approach.
Interesting, the kernel change they propose wouldn't be too hard to implement. My only concern is making the API more and more cumbersome by piling features. However this one is requested often enough to be worth considering.
I can try to take a look at it too. I'll let you know if I get anywhere.
Could I propose a different approach to categorical strings data types ? I would suggest a one-hot implementation, in practice creating n bool dimensions on the search space. categorical types are independent.
Did anyone advanced on this ?
It would be really useful to have these types supported.
+1 I'd like this too!
+1 I'd like this too!
I proposed an implementation for the type integer in a merge request. It is more a first shot than a terminated work, improvements can be done. Could anyone take a look to discuss on this please?
Great suggestion! Parameter typing would be really useful, specially for categorical parameters.
+1 I would like to use integers
Is it possible to exclude specific points in the bounds? I mean, when defining BayesianOptimization(f=black_box_function, pbounds={'e': (0, 1)}) I do not actually want 'e' to be 0. Surely I can write pbounds={'e': (0.0001, 1)} or something like that but it is not nice.
Another thing, it sometime "gets stuck" on points (iter 10-16) which seems a waste:
| iter | target | e |
| 1 | 0.7492 | 0.2963 | | 2 | 0.03762 | 0.7072 | | 3 | 0.6771 | 0.2084 | | 4 | 0.4013 | 0.4408 | | 5 | 0.03448 | 0.9871 | | 6 | 0.3762 | 0.001 | | 7 | 0.7429 | 0.2671 | | 8 | 0.7461 | 0.287 | | 9 | 0.721 | 0.3317 | | 10 | 0.7492 | 0.2928 | | 11 | 0.7492 | 0.2927 | | 12 | 0.7492 | 0.2925 | | 13 | 0.7492 | 0.2928 | | 14 | 0.7492 | 0.2917 | | 15 | 0.7492 | 0.291 | | 16 | 0.7492 | 0.2945 | | 17 | 0.7461 | 0.2891 | | 18 | 0.7367 | 0.3024 | | 19 | 0.03448 | 0.8466 | | 20 | 0.05016 | 0.5746 |
Is it intentional?
There's no way to exclude boundary points. I don't think the extra complexity would justify. As you mentioned you can simply use (1e-4, 1), or something like it, since the choice of lower bound should be immaterial. If, however, you believe the difference between picking 1e-5 or 1e-3 as a lower bound is important, you should transform this variable to a log scale.
And in this particular example the optimizer was not stuck, it was simply exploiting the maximum region around 0.292....
Thank you.
+1 any updates? thanks.
I'm looking to use this for booleans, any updates?