tpot icon indicating copy to clipboard operation
tpot copied to clipboard

Question - Support for different types of categorical variable encoding

Open SSMK-wq opened this issue 2 years ago • 2 comments

Hi,

Does Tpot offer any automated way to convert categorical feature into encoded variables.

Context of the issue

I have an input dataset with more than 100 variables where around 80% of the variables are categorical in nature.

While some variables like gender, country etc can be one-hot encoded but I also have few variables which have an inherent order in their values such rating - Very good, good, bad etc.

Is there any approach/option in Tpot which we can use to do this encoding based on the variable type.

For ex: I would like to provide the below two lists as input to the tpot auto-ml arguments.

one-hot-list = ['Gender', 'Country'] #one-hot encoding ordinal_list = ['Feedback', 'Level_of_interest'] #ordinal encoding

Is there any option in the package that can do this for us?

Or is there any other efficient way to do this as I have 80 categorical columns

SSMK-wq avatar Jan 15 '22 12:01 SSMK-wq

Hi @SSMK-wq,

did you find a work around to this? I don't see any documentation saying that TPOT handles encoding of categorical features, or different/predefined encoding, for example, ordinal vs one-hot encoding.

fjpa121197 avatar May 04 '22 13:05 fjpa121197

Bumping this as it would be nice to pass categorical features to tpot. Tpot includes OneHotEncoder in its default estimator set for regressions, but it's only usable for integers as it stands. I see the fit method throws an error on np.isnan. I'm sure there's more to it than changing that though.

spenceforce avatar Sep 17 '22 02:09 spenceforce