botorch [FEATURE REQUEST]: NumericToCategoricalEnconding Input Transform

Motivation

@TobyBoyne recently added the support for categorical dimensions to optimize_mixed_alternating. This implementation assumes that the categorical dimensions are integer encoded. As consequence, it is directly usable with the MixedSingleTaskGP, but not with GPs that for example assume one-hot encoded categoricals. A solution for this case would be to have a NumericToOneHot input transform that encodes the categorical feature(s) with in the model.

Currently, botorch features a OneHotToNumeric input transform including a untransform functionality. It should be relatively straight-forward to come up with a NumericToOneHot transform based on this. This would solve the issue of models which expect one-hot encoded features, but not other possible encodings. For example, in chemistry one often uses descriptor encodings, in which one transforms the categorical feature in some kind of descriptors space based on chemical descriptors. From a software engineering point of view, these descriptor encodings of categoricals are very much the same as one-hot encodings: one transforms a vector into a matrix, just the mapping is different. For this reason, I would like to implement a generic NumericToCategoricalEnconding input transform which takes upon instantiation information regarding the dimensionality of the encoding space and a (non-differentiable) callable that performs the transformation. This would allow to use optimize_mixed_alternating with any kind of categorical encoding.

What do you think? Does this sounds reasonable to you?

Best,

Johannes

Describe the solution you'd like to see implemented in BoTorch.

see above

Describe any alternatives you've considered to the above solution.

No response

Is this related to an existing issue in BoTorch or another repository? If so please include links to those Issues here.

No response

Pull Request

Yes

Code of Conduct

[x] I agree to follow BoTorch's Code of Conduct

Jun 13 '25 14:06 jduerholt

Hi @jduerholt. That seems reasonable overall but I am curious about one aspect of the approach.

Why model using one-hot encoded features, rather than using something like CategoricalKernel? If you're optimizing it using a single numerical feature, you could do the same in the model and let the kernel handle it, rather than having your kernel be defined over one hot encoded features. We have MixedSingleTaskGP that uses CategoricalKernel. I am hoping to replace OneHot in Ax with CategoricalKernel in H2 as well (haven't benchmarked this extensively yet).

Jun 13 '25 15:06 saitcakmak

I guess one could also use the IndexKernel here, instead of CategoricalKernel. That one uses an embedding under the hood to compute distances, which seems like it'd work just as well for categorical features.

Jun 13 '25 15:06 saitcakmak

Hi @saitcakmak, it would be nice to be able to use the alternating optimzer also over one-hot encoded features as a baseline model. We are using as default already the MixedSingleTaskGP. The real use case would be to use the alternating optimizer also with other (molecular) encodings, and for this the same mechanism would apply as for one-hot. The idea with the IndexKernel is nice, one should run some benchmarks there.

am hoping to replace OneHot in Ax with CategoricalKernel in H2 as well (haven't benchmarked this extensively yet).

What do you mean with H2?

So, it would be fine for you if I would file a PR with some kind of generic NumericToEncoding input transform?

Jun 13 '25 21:06 jduerholt

What do you mean with H2?

The second half of this year :)

Jun 14 '25 06:06 Balandat

it would be fine for you if I would file a PR with some kind of generic NumericToEncoding input transform?

I'm reluctant to add code to BoTorch unless we think it'll get regular usage. With this transform, do you have any results that'd support its broader usage? If I understand correctly, the setup would be

Construct the model with X_one_hot.
Transform only applies on eval / posterior calls.
Optimize the acqf with X_categorical.

The switch between X_one_hot and X_categorical within the BO loop makes this setup a bit unappealing to me. One could always one-hot encode outside of the BO loop and use the kernel the categorical modeling. In that sense, this seems like one additional option for the same purpose that is less user friendly.

@Balandat, wdyt?

Jun 16 '25 14:06 saitcakmak

Hmm, I am not 100% sure that we mean the same.

Logic is as follows:

Input to the model and the optimizer is X_categorical, where I mean that one categorical value corresponds to one integer encoded column in X_categorical. If you have 5 categories, than the integer values run from zero to 4.
The transform would then transform X_categorical to any kind of encoding in which the integer column is translated to a set of columns, so a categorical value is transformed from an integer to a vector. The encoding function can be a function that performs a one-hot transformation, but also anything else that goes from integer to a vector. In chemistry this would be for example useful if you can express the categories by some kind of descriptor values.
The transform can be also used on training if ones wishes to do so.

The advantage would be that independent of the transform, the interface to the model would be the same, which is very useful for the alternating optimizer.

Is this also what you meant?

Jun 16 '25 15:06 jduerholt