sklearn-pandas icon indicating copy to clipboard operation
sklearn-pandas copied to clipboard

DataFrameMapper.inverse_transform() for simple transformations

Open erikjandevries opened this issue 7 years ago • 6 comments

I've added an inverse_transform() method to the DataFrameMapper that works for simple transformations. I've included tests using the LabelEncoder and LabelBinarizer, which are passed.

This still fails for more complicated transformations such as Pipelines. I hope it's a useful start at least.

erikjandevries avatar Nov 14 '17 08:11 erikjandevries

Not sure what's going wrong - when I tested the solution, all tests passed. Should I have tested differently?

$ python -m pytest -s -q tests/test_dataframe_mapper.py
/usr/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
...................................................
============================================================================================================= warnings summary ==============================================================================================================
tests/test_dataframe_mapper.py::test_list_transformers
  /usr/lib/python3.6/site-packages/sklearn/utils/validation.py:444: DataConversionWarning: Data with input dtype int64 was converted to float64 by StandardScaler.
    warnings.warn(msg, DataConversionWarning)

-- Docs: http://doc.pytest.org/en/latest/warnings.html
51 passed, 1 warnings in 4.82 seconds

erikjandevries avatar Nov 14 '17 08:11 erikjandevries

@erikjandevries Click Details link near CircleCI message to see what is going wrong. Mostly - PEP8 violations, as I can see.

devforfu avatar Nov 14 '17 09:11 devforfu

@devforfu Thanks for the hints, indeed they were PEP8 violations, which I've now fixed. I guess in my opinion, some PEP8 rules make my code less readable, but I understand the need for standardisation when working in (larger) teams :)

erikjandevries avatar Nov 14 '17 14:11 erikjandevries

@erikjandevries Do you think that it is possible to address the issues pointed by @dukebody? Then we can do a final review and merge into master.

devforfu avatar Sep 05 '18 11:09 devforfu

After a failed PR and some fiddling around, I figured out why that new sub-field was necessary. In the case of one-to-many transformers, it is necessary to maintain a label list that preserves the grouping of the columns. (i.e. ['A'_1, 'A_2', 'A_3'] in the case of the label encoder) The field that @dukebody suggested to use only has these columns preserved in a flat structure. I would vote to merge this PR in (@devforfu) as it looks good otherwise.

adithyabsk avatar Nov 09 '18 05:11 adithyabsk

mapper = sklearn_pandas.DataFrameMapper([
    ('index', None),
   ...

With None transforms I get the error:

'NoneType' object has no attribute 'inverse_transform'

Though it isn't critical at all, the feature is nice and useful and can be merged as is. Just pointing a direction for further improvement.

edit: Actually, I was unable to make it work for me... TypeError: unhashable type: 'slice'

anatol-grabowski avatar Nov 09 '18 09:11 anatol-grabowski

你好,已收到,谢谢。

hu-minghao avatar Dec 14 '23 09:12 hu-minghao