category_encoders
category_encoders copied to clipboard
A library of sklearn compatible categorical variable encoders
# Summary Implement fit and transform function of multi-hot encoding for ambiguous|dirty categorical feature. #161 I hope you to check the usefulness.
It should be possible to programmatically differentiate between encoders that: 1. Do not require any target during fitting (like OneHotEncoder). 2. Require some target during fitting (like TargetEncoder). 3. Require...
Recently I executed `TargetEncoder(handle_missing='indicator')` without any issue and it surprised me a bit. A possible solution: ```python if handle_missing not in [None, 'error', 'return_nan', 'value']: raise ValueError('Unexpected handle_missing value: '...
I propose to implement simple multi-hot encoding which allows ambiguous input and outputs non-negative value. Let x_j be a realization of department of a student. Usually, we assume that x_j...
Historically this project has done releases based on when I felt like doing a release, or when someone explicitly opened an issue asking for one. That's not great. I'd like...
It would be great to use something like this: https://github.com/EpistasisLab/penn-ml-benchmarks to get a comprehensive view of memory usage, time-to-transform, and end-model accuracy for all encoders. It'd probably take a very...
Hello, the H2o ML Framework supports an `enum`-encoding scheme. It would be nice to have this for sklearn as well. As far as I know there are no contributions made...
I found interesting approach in paper "The Synthetic Data Vault: Generative Modeling for Relational Databases". It seems like there are no implementations in popular libs. Steps: 1. Sort the categories...
## Proposed Changes - Adds gray encoder as suggested in #300
## Expected Behavior The constant (all values 1) intercept column should not be added when applying contrast coding schemes (i.e. backward difference, sum, polynomial and helmert coding) I don't think...