category_encoders icon indicating copy to clipboard operation
category_encoders copied to clipboard

A library of sklearn compatible categorical variable encoders

Results 89 category_encoders issues
Sort by recently updated
recently updated
newest added

CountEncoder() has a min_group_size parameter that sets a minimum number of obs in a group required in order not for the group to be lumped together with other small groups....

enhancement
good first issue

Add Unary/Thermometer Encoding as an alternative for Ordinal Encoding. According to [Source](https://openreview.net/forum?id=S18Su--CW) it is also more robust to adversarial examples

enhancement

In the documentatin of the gaussia noise regularization it says > adds normal (Gaussian) distribution noise into training data in order to decrease overfitting (testing data are untouched). Sigma gives...

documentation

I wanted a start with index 0 instead of 1 hence, custom mapping using enumerate, ``` oe_mapping = [{'col': c, 'mapping': {map_: map_idx for map_idx, map_ in enumerate(df[c].unique())}} for c...

bug

Is there an example of using `TargetEncoder` w/ a categorical target variable? The docstring suggests that it should be possible, but I don't see how the code is determining that...

bug
help wanted

I think I've found a subtle bug in PolynomialWrapper. The expected behaviour is that if there are n classes in the target column, then PolynomialWrapper will return n-1 new features...

bug

## Feature description > Gray Encoding. Our second approach is a variant of binary encoding, namely, Gray encoding [24]. It is a binary encoding system where two successive values differ...

enhancement

I found that the WOEEncoder was not giving the right scores, and it was because of the the agg stats not being correct from pandas. The stats did work if...

From section 4 of the [paper](https://kaggle2.blob.core.windows.net/forum-message-attachments/225952/7441/high%20cardinality%20categoricals.pdf) sited in TargetEncoding. > Instead of choosing the prior probability of the target as the _null hypothesis_, it is reasonable to replace it with...

enhancement

Although this is very simple to implement in Pandas or similar, it would be very nice to have this in here as a scikit compatible transformer. It is a good...

enhancement