feature_engine
feature_engine copied to clipboard
`CountFrequencyEncoder` could have a parameter to group categories with few observations
Useful to handle rare categories in highly cardinal variables.
If a category is present in less than a certain threshold of observations it should be replaced by a certain value. Check page 91 of Alice Zhengs book and the CountEncoder from category encoders.
Hey, @solegalli, I would like to work on this issue if possible.
Great! Go for it!
Do you have an idea of how to go about it? or shall we discuss here first?
Great! Go for it!
Do you have an idea of how to go about it? or shall we discuss here first?
Hi, Is the book name "Feature engineering for machine learning" ?, I think ill be able to better understand and discuss about it after going through the book.
This is the book: https://www.amazon.de/-/en/Alice-Zheng/dp/1491953241
@solegalli Hi, I went through the book. Here's my understanding. In case of rare categories whose count is less than the threshold specified, we need to group/replace all such categories a certain value(bin). Can you please tell confirm if my understanding is correct ? Thanks.
Yes, that is correct :)
Okay Cool ! Ill do it and submit a PR.