category_encoders icon indicating copy to clipboard operation
category_encoders copied to clipboard

Count Encoding

Open JoshuaC3 opened this issue 7 years ago • 4 comments

Although this is very simple to implement in Pandas or similar, it would be very nice to have this in here as a scikit compatible transformer. It is a good benchmark for high-cardinality feature encodings and having a quick "plug and play" would help users test this, particularly when doing KFolds over a scikit pipeline.

Happy to PR for this.

JoshuaC3 avatar Oct 16 '18 12:10 JoshuaC3

Once the standard CountEncoding is implemented we can also include kwargs log_transform=True and label_count_ranking=True.

JoshuaC3 avatar Oct 16 '18 12:10 JoshuaC3

A PR would be greatly appreciated.

janmotl avatar Oct 16 '18 14:10 janmotl

@janmotl Very happy to take a look at this.

JoshuaC3 avatar Oct 16 '18 14:10 JoshuaC3

Just to follow up on this discussion to see if the issue is still relevant:
The count encoder is implemented now right? (https://github.com/scikit-learn-contrib/category_encoders/blame/master/category_encoders/count.py) Should this issue kept open for remembering log_transform and label_count_ranking?

PaulWestenthanner avatar Oct 13 '21 15:10 PaulWestenthanner