category_encoders icon indicating copy to clipboard operation
category_encoders copied to clipboard

A library of sklearn compatible categorical variable encoders

Results 89 category_encoders issues
Sort by recently updated
recently updated
newest added

https://github.com/scikit-learn-contrib/category_encoders/blob/6a13c14919d56fed8177a173d4b3b82c5ea2fef5/category_encoders/utils.py#L322-L323 For the function `_check_transform_inputs()`, I do not want it to report error when # of dimensions for testing set don't align with training set. However, the default is it...

## Expected Behavior HashingEncoders should encode the categorical columns successfully ## Actual Behavior Got EOF error issue while calling HashingEncoders function ## Steps to Reproduce the Problem 1. Packages installed...

non-reproducible

I would like to do target encoding on the composite of multiple columns, but the current functionality only allows a single column to be encoded. For example: I have a...

## Expected Behavior Currently, `handle_missing=value` adds a new column although the documentation says `'value' will encode a new value as 0 in every dummy column.` Furthermore, we need a test...

bug
good first issue

I recently stumbled upon a categorical encoding idea dubbed "Distributed Robust Algorithm for Count-based Learning" (aka Dracula) described in [this Microsoft blog](https://learn.microsoft.com/en-us/archive/blogs/machinelearning/big-learning-made-easy-with-counts) as well as [this talk](https://www.youtube.com/watch?v=7sZeTxIrnxs). It seems like...

enhancement

**Current State** The class variable `mapping` of the BaseEncoder class is a dictionary with the following structure a dictionary with the following item: - 'col' -> column name (str) -...

## Expected Behavior full compatibility with sklearn pipelines ## Actual Behavior we're only compatible with pandas mode of sklearn. By default a multi-step pipeline, that has an encoder not as...

discussion

## Expected Behavior If I set n_components to 8, for example, I should be able to encode columns with up to 2^8 values. ## Actual Behavior Whatever I set n_components...

## Expected Behavior No ``FutureWarning`` is thrown. ## Actual Behavior Currently the following warning is thrown. ```text category_encoders/ordinal.py:198: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and...

We have a big data frame that we want to fit into a CountEncoder. We would like to somehow make use of the multiple cores of our machine. We would...

question