sklearn-pandas DataFrameMapper.inverse_transform() for simple transformations

trafficstars

I've added an inverse_transform() method to the DataFrameMapper that works for simple transformations. I've included tests using the LabelEncoder and LabelBinarizer, which are passed.

This still fails for more complicated transformations such as Pipelines. I hope it's a useful start at least.

Nov 14 '17 08:11 erikjandevries

Not sure what's going wrong - when I tested the solution, all tests passed. Should I have tested differently?

$ python -m pytest -s -q tests/test_dataframe_mapper.py
/usr/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
...................................................
============================================================================================================= warnings summary ==============================================================================================================
tests/test_dataframe_mapper.py::test_list_transformers
  /usr/lib/python3.6/site-packages/sklearn/utils/validation.py:444: DataConversionWarning: Data with input dtype int64 was converted to float64 by StandardScaler.
    warnings.warn(msg, DataConversionWarning)

-- Docs: http://doc.pytest.org/en/latest/warnings.html
51 passed, 1 warnings in 4.82 seconds

Nov 14 '17 08:11 erikjandevries

@erikjandevries Click Details link near CircleCI message to see what is going wrong. Mostly - PEP8 violations, as I can see.

Nov 14 '17 09:11 devforfu

@devforfu Thanks for the hints, indeed they were PEP8 violations, which I've now fixed. I guess in my opinion, some PEP8 rules make my code less readable, but I understand the need for standardisation when working in (larger) teams :)

Nov 14 '17 14:11 erikjandevries

@erikjandevries Do you think that it is possible to address the issues pointed by @dukebody? Then we can do a final review and merge into master.

Sep 05 '18 11:09 devforfu

After a failed PR and some fiddling around, I figured out why that new sub-field was necessary. In the case of one-to-many transformers, it is necessary to maintain a label list that preserves the grouping of the columns. (i.e. ['A'_1, 'A_2', 'A_3'] in the case of the label encoder) The field that @dukebody suggested to use only has these columns preserved in a flat structure. I would vote to merge this PR in (@devforfu) as it looks good otherwise.

Nov 09 '18 05:11 adithyabsk

mapper = sklearn_pandas.DataFrameMapper([
    ('index', None),
   ...

With None transforms I get the error:

'NoneType' object has no attribute 'inverse_transform'

Though it isn't critical at all, the feature is nice and useful and can be merged as is. Just pointing a direction for further improvement.

edit: Actually, I was unable to make it work for me... TypeError: unhashable type: 'slice'

Nov 09 '18 09:11 anatol-grabowski

你好，已收到，谢谢。

Dec 14 '23 09:12 hu-minghao

sklearn-pandas sklearn-pandas copied to clipboard

DataFrameMapper.inverse_transform() for simple transformations

sklearn-pandas
sklearn-pandas copied to clipboard