auto-sklearn icon indicating copy to clipboard operation
auto-sklearn copied to clipboard

Enhancement: Make the Ordinal Encoder a encoder choice

Open franchuterivera opened this issue 4 years ago • 2 comments

Problem statement: scikit learn 0.24 does not support np.nan when doing ordinal encoding of categorical columns. This is a feature added in 0.25. Because of this, we are forced to have imputation here before ordinal encoding.

Suggestion: When moving to the next scikit learn, we can remove the ordinal encoder from the categorical pipeline steps, so that it is no longer before imputation. This will also allow us to remove the noencoder choice.

franchuterivera avatar May 26 '21 22:05 franchuterivera

@franchuterivera ello,

Was this resolved with PR #1135 or still waiting on scikit-learn 0.25?

eddiebergman avatar Jul 19 '21 14:07 eddiebergman

Still waiting. Currently the ordinal encoder is used prior to the one hot encoder, but should actually be part of the OHEChoice: https://github.com/automl/auto-sklearn/blob/28199b02ba9e8140299c9f770faa6cb2607923e9/autosklearn/pipeline/components/data_preprocessing/data_preprocessing_categorical.py#L116

mfeurer avatar Jul 19 '21 14:07 mfeurer