category_encoders icon indicating copy to clipboard operation
category_encoders copied to clipboard

Broken inverse_transform for OrdinalEncoder when custom mapping in use!

Open mdalvi opened this issue 6 years ago • 3 comments

I wanted a start with index 0 instead of 1 hence, custom mapping using enumerate,

oe_mapping = [{'col': c, 'mapping': {map_: map_idx for map_idx, map_ in enumerate(df[c].unique())}} for c in categoricals]
oe = OrdinalEncoder(cols=categoricals, handle_unknown='ignore', mapping=oe_mapping, return_df=False)
df[categoricals] = oe.fit_transform(df[categoricals])

While doing inverse_transform I got below,

oe.inverse_transform(df[categoricals])

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-64-419b17b4852b> in <module>
----> 1 oe.inverse_transform(df[categoricals])

e:\Anaconda3\lib\site-packages\category_encoders\ordinal.py in inverse_transform(self, X_in)
    264         for switch in self.mapping:
    265             column_mapping = switch.get('mapping')
--> 266             inverse = pd.Series(data=column_mapping.index, index=column_mapping.get_values())
    267             X[switch.get('col')] = X[switch.get('col')].map(inverse).astype(switch.get('data_type'))
    268 

AttributeError: 'dict' object has no attribute 'index'

How do we handle this error?

mdalvi avatar Aug 01 '19 21:08 mdalvi

Yes, that's a bug. The inverse expects the mapping to be a Series but is a map. Hence, the workaround is to use something like:

    def test_inverse_with_mapping(self):
        df = X.copy(deep=True)
        df = df.drop(['categorical', 'na_categorical'], axis=1) # Categorical data type does not currently pass the test
        categoricals = ['unique_int', 'unique_str', 'invariant', 'underscore', 'none', 'extra', 321]
        # oe_mapping = [{'col': c, 'mapping': {map_: map_idx for map_idx, map_ in enumerate(df[c].unique())}} for c in categoricals]
        oe_mapping = [{'col': c, 'mapping': pd.Series(data=range(len(df[c].unique())), index=df[c].unique()), 'data_type': X[c].dtype} for c in categoricals]
        oe = encoders.OrdinalEncoder(cols=categoricals, handle_unknown='ignore', mapping=oe_mapping, return_df=True)
        df[categoricals] = oe.fit_transform(df[categoricals])

        recovered = oe.inverse_transform(df[categoricals])

        pd.testing.assert_frame_equal(X[categoricals], recovered)

janmotl avatar Aug 05 '19 09:08 janmotl

Hello , is this issue fixed ?

atuliesbpl avatar Feb 17 '21 09:02 atuliesbpl

Hi all, I just solve this issue through changing the code in 'category_encoders\ordinal.py: 268' from inverse = pd.Series(data=column_mapping.index, index=column_mapping.values) to inverse = pd.Series(data=column_mapping.keys(), index=column_mapping.values())

according to the pd.Series documentation and definition of dict in python. https://pandas.pydata.org/docs/reference/api/pandas.Series.html

EchizenG avatar Oct 18 '21 20:10 EchizenG

This issue is still not fixed... I submitted a PR with your comment @EchizenG

fredmontet avatar Nov 09 '22 14:11 fredmontet

closing this as I fixed it in #222

PaulWestenthanner avatar Jan 14 '23 20:01 PaulWestenthanner