feature_engine icon indicating copy to clipboard operation
feature_engine copied to clipboard

`CountFrequencyEncoder` could have a parameter to group categories with few observations

Open solegalli opened this issue 2 years ago • 7 comments

Useful to handle rare categories in highly cardinal variables.

If a category is present in less than a certain threshold of observations it should be replaced by a certain value. Check page 91 of Alice Zhengs book and the CountEncoder from category encoders.

solegalli avatar Apr 25 '22 17:04 solegalli

Hey, @solegalli, I would like to work on this issue if possible.

SangamSwadiK avatar Jun 12 '22 00:06 SangamSwadiK

Great! Go for it!

Do you have an idea of how to go about it? or shall we discuss here first?

solegalli avatar Jun 12 '22 06:06 solegalli

Great! Go for it!

Do you have an idea of how to go about it? or shall we discuss here first?

Hi, Is the book name "Feature engineering for machine learning" ?, I think ill be able to better understand and discuss about it after going through the book.

SangamSwadiK avatar Jun 12 '22 16:06 SangamSwadiK

This is the book: https://www.amazon.de/-/en/Alice-Zheng/dp/1491953241

solegalli avatar Jun 13 '22 08:06 solegalli

@solegalli Hi, I went through the book. Here's my understanding. In case of rare categories whose count is less than the threshold specified, we need to group/replace all such categories a certain value(bin). Can you please tell confirm if my understanding is correct ? Thanks.

SangamSwadiK avatar Jun 14 '22 13:06 SangamSwadiK

Yes, that is correct :)

solegalli avatar Jun 14 '22 14:06 solegalli

Okay Cool ! Ill do it and submit a PR.

SangamSwadiK avatar Jun 14 '22 14:06 SangamSwadiK