category_encoders
category_encoders copied to clipboard
Support has_time parameter in CatBoostEncoder
Expected Behavior
In CatBoost, the highly-cardinal categorical features can be encoded in two ways using the has_time parameter:
- w/ random permutation (has_time=False)
- w/o random permutation (has_time=True)
https://catboost.ai/en/docs/concepts/parameter-tuning#internal-dataset-order https://github.com/catboost/catboost/issues/1076#issuecomment-905039708
Actual Behavior
The CatBoostEncoder only supports the 1st method.
Proposal
Add the has_time parameter to CatBoostEncoder in order to support ordered datasets
I think it's the opposite: existing implementation of CBE uses no permutation, so it is time-aware implementation. Old implementation used LOO once, so it was 1 round of permutations.
Maybe original CBE implementation coders could correct me if I'm wrong.
@GLevV , I wasn't notified when you made your comment.
Interestingly, the docstring indicates that the random permutation is occurring, but I don't see that it is happening anywhere in the code itself.
@PaulWestenthanner, I wonder if the docstring is incorrect?
I'm not too familiar with out implementation of catboost encoder. I'd need to have a look at it. I cannot guarantee the docstrings are correct