sklearn-pandas
sklearn-pandas copied to clipboard
DataFrameMapper.inverse_transform() for simple transformations
I've added an inverse_transform()
method to the DataFrameMapper
that works for simple transformations.
I've included tests using the LabelEncoder
and LabelBinarizer
, which are passed.
This still fails for more complicated transformations such as Pipeline
s. I hope it's a useful start at least.
Not sure what's going wrong - when I tested the solution, all tests passed. Should I have tested differently?
$ python -m pytest -s -q tests/test_dataframe_mapper.py
/usr/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
...................................................
============================================================================================================= warnings summary ==============================================================================================================
tests/test_dataframe_mapper.py::test_list_transformers
/usr/lib/python3.6/site-packages/sklearn/utils/validation.py:444: DataConversionWarning: Data with input dtype int64 was converted to float64 by StandardScaler.
warnings.warn(msg, DataConversionWarning)
-- Docs: http://doc.pytest.org/en/latest/warnings.html
51 passed, 1 warnings in 4.82 seconds
@erikjandevries Click Details link near CircleCI message to see what is going wrong. Mostly - PEP8 violations, as I can see.
@devforfu Thanks for the hints, indeed they were PEP8 violations, which I've now fixed. I guess in my opinion, some PEP8 rules make my code less readable, but I understand the need for standardisation when working in (larger) teams :)
@erikjandevries Do you think that it is possible to address the issues pointed by @dukebody? Then we can do a final review and merge into master
.
After a failed PR and some fiddling around, I figured out why that new sub-field was necessary. In the case of one-to-many transformers, it is necessary to maintain a label list that preserves the grouping of the columns. (i.e. ['A'_1, 'A_2', 'A_3'] in the case of the label encoder) The field that @dukebody suggested to use only has these columns preserved in a flat structure. I would vote to merge this PR in (@devforfu) as it looks good otherwise.
mapper = sklearn_pandas.DataFrameMapper([
('index', None),
...
With None
transforms I get the error:
'NoneType' object has no attribute 'inverse_transform'
Though it isn't critical at all, the feature is nice and useful and can be merged as is. Just pointing a direction for further improvement.
edit: Actually, I was unable to make it work for me... TypeError: unhashable type: 'slice'
你好,已收到,谢谢。