category_encoders icon indicating copy to clipboard operation
category_encoders copied to clipboard

Support has_time parameter in CatBoostEncoder

Open kylegilde opened this issue 3 years ago • 3 comments

Expected Behavior

In CatBoost, the highly-cardinal categorical features can be encoded in two ways using the has_time parameter:

  1. w/ random permutation (has_time=False)
  2. w/o random permutation (has_time=True)

https://catboost.ai/en/docs/concepts/parameter-tuning#internal-dataset-order https://github.com/catboost/catboost/issues/1076#issuecomment-905039708

Actual Behavior

The CatBoostEncoder only supports the 1st method.

Proposal

Add the has_time parameter to CatBoostEncoder in order to support ordered datasets

kylegilde avatar Feb 11 '22 18:02 kylegilde

I think it's the opposite: existing implementation of CBE uses no permutation, so it is time-aware implementation. Old implementation used LOO once, so it was 1 round of permutations.

Maybe original CBE implementation coders could correct me if I'm wrong.

glevv avatar Feb 13 '22 09:02 glevv

@GLevV , I wasn't notified when you made your comment.

Interestingly, the docstring indicates that the random permutation is occurring, but I don't see that it is happening anywhere in the code itself.

@PaulWestenthanner, I wonder if the docstring is incorrect?

kylegilde avatar Apr 12 '22 15:04 kylegilde

I'm not too familiar with out implementation of catboost encoder. I'd need to have a look at it. I cannot guarantee the docstrings are correct

PaulWestenthanner avatar Apr 13 '22 13:04 PaulWestenthanner