category_encoders
category_encoders copied to clipboard
Count Encoding
Although this is very simple to implement in Pandas or similar, it would be very nice to have this in here as a scikit compatible transformer. It is a good benchmark for high-cardinality feature encodings and having a quick "plug and play" would help users test this, particularly when doing KFolds over a scikit pipeline.
Happy to PR for this.
Once the standard CountEncoding is implemented we can also include kwargs log_transform=True and label_count_ranking=True.
A PR would be greatly appreciated.
@janmotl Very happy to take a look at this.
Just to follow up on this discussion to see if the issue is still relevant:
The count encoder is implemented now right? (https://github.com/scikit-learn-contrib/category_encoders/blame/master/category_encoders/count.py) Should this issue kept open for remembering log_transform and label_count_ranking?