auto-sklearn
auto-sklearn copied to clipboard
Enhancement: Make the Ordinal Encoder a encoder choice
Problem statement:
scikit learn 0.24 does not support np.nan when doing ordinal encoding of categorical columns. This is a feature added in 0.25. Because of this, we are forced to have imputation here before ordinal encoding.
Suggestion: When moving to the next scikit learn, we can remove the ordinal encoder from the categorical pipeline steps, so that it is no longer before imputation. This will also allow us to remove the noencoder choice.
@franchuterivera ello,
Was this resolved with PR #1135 or still waiting on scikit-learn 0.25?
Still waiting. Currently the ordinal encoder is used prior to the one hot encoder, but should actually be part of the OHEChoice: https://github.com/automl/auto-sklearn/blob/28199b02ba9e8140299c9f770faa6cb2607923e9/autosklearn/pipeline/components/data_preprocessing/data_preprocessing_categorical.py#L116