category_encoders
category_encoders copied to clipboard
A library of sklearn compatible categorical variable encoders
[sklearn.preprocessing.OneHotEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html) has the option `sparse=True`, to return the output in a scipy.sparse matrix. This can be really useful if you have categories with high cardinality. Would it be possible to...
are you planning to implement parallel encoding of features for woe encoding ?
Versions sklearn: '0.22.1' category_encoders: 2.1.0 Issue - if I use a fitted BinaryEncoder instance in a custom classifier, there is a ValueError "ValueError: Must train encoder before it can be...
**Summary** `OrdinalEncoder.fit()` throws an exception when the input values are entirely numeric (I.E. `[1, 2, 3, 4, 5]`) or can be converted to be numeric (I.E. `['001', '002', '003', '004',...
Hi! I came up here searching about how to encode categorical variables which have a circular distance relation (such as the days of the week, where the last day, sunday,...
Hi I know that library is focused on categorical-encoding, but I think there is a value in adding at least `StandardScaler` and `MinMaxScaler`, with such nice interface like we have...
I am packaging this Python package on nixpkgs. When running test, I ran into: ``` error: [Errno 2] File b'source_data/mushrooms/agaricus-lepiota.csv' does not exist: b'source_data/mushrooms/agaricus-lepiota.csv' ``` I think that the path...
I'm trying to see the output of using HashingEncoder, and I've used the original sample code from the documentation, and I don't see any differences between the transformed and non-transformed...
I know that I'm asking for a lot here but it'd be great to have some idea of what encoding strategies are useful in some cases : classification vs regression...
Extend HashingEncoder to work with `util.hash_pandas_object` as the hashing function. **Reasoning**: Currently, HashingEncoder relies on hashlib. Hashlib is nice, however: 1. hashlib works only value by value -> no vectorization...