category_encoders
category_encoders copied to clipboard
Broken inverse_transform for OrdinalEncoder when custom mapping in use!
I wanted a start with index 0 instead of 1 hence, custom mapping using enumerate,
oe_mapping = [{'col': c, 'mapping': {map_: map_idx for map_idx, map_ in enumerate(df[c].unique())}} for c in categoricals]
oe = OrdinalEncoder(cols=categoricals, handle_unknown='ignore', mapping=oe_mapping, return_df=False)
df[categoricals] = oe.fit_transform(df[categoricals])
While doing inverse_transform I got below,
oe.inverse_transform(df[categoricals])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-64-419b17b4852b> in <module>
----> 1 oe.inverse_transform(df[categoricals])
e:\Anaconda3\lib\site-packages\category_encoders\ordinal.py in inverse_transform(self, X_in)
264 for switch in self.mapping:
265 column_mapping = switch.get('mapping')
--> 266 inverse = pd.Series(data=column_mapping.index, index=column_mapping.get_values())
267 X[switch.get('col')] = X[switch.get('col')].map(inverse).astype(switch.get('data_type'))
268
AttributeError: 'dict' object has no attribute 'index'
How do we handle this error?
Yes, that's a bug. The inverse expects the mapping to be a Series but is a map. Hence, the workaround is to use something like:
def test_inverse_with_mapping(self):
df = X.copy(deep=True)
df = df.drop(['categorical', 'na_categorical'], axis=1) # Categorical data type does not currently pass the test
categoricals = ['unique_int', 'unique_str', 'invariant', 'underscore', 'none', 'extra', 321]
# oe_mapping = [{'col': c, 'mapping': {map_: map_idx for map_idx, map_ in enumerate(df[c].unique())}} for c in categoricals]
oe_mapping = [{'col': c, 'mapping': pd.Series(data=range(len(df[c].unique())), index=df[c].unique()), 'data_type': X[c].dtype} for c in categoricals]
oe = encoders.OrdinalEncoder(cols=categoricals, handle_unknown='ignore', mapping=oe_mapping, return_df=True)
df[categoricals] = oe.fit_transform(df[categoricals])
recovered = oe.inverse_transform(df[categoricals])
pd.testing.assert_frame_equal(X[categoricals], recovered)
Hello , is this issue fixed ?
Hi all, I just solve this issue through changing the code in 'category_encoders\ordinal.py: 268' from
inverse = pd.Series(data=column_mapping.index, index=column_mapping.values)
to
inverse = pd.Series(data=column_mapping.keys(), index=column_mapping.values())
according to the pd.Series documentation and definition of dict in python. https://pandas.pydata.org/docs/reference/api/pandas.Series.html
This issue is still not fixed... I submitted a PR with your comment @EchizenG
closing this as I fixed it in #222