sklearn-pandas
sklearn-pandas copied to clipboard
Unexpected Dropping of columns
In the following lines the resulting prints do not change if the line drop_cols=["salary"]
is commented out:
import sklearn.preprocessing
import pandas as pd
import sklearn_pandas
data = pd.DataFrame(
{
"pet": ["cat", "dog", "dog", "fish", "cat", "dog", "cat", "fish"],
"children": [4.0, 6, 3, 3, 2, 3, 5, 4],
"salary": [90.0, 24, 44, 27, 32, 59, 36, 27],
}
)
mapper = sklearn_pandas.DataFrameMapper(
[
("pet", sklearn.preprocessing.LabelBinarizer()),
(["children"], sklearn.preprocessing.StandardScaler()),
],
input_df=True,
df_out=True,
drop_cols=["salary"],
)
print(data)
print()
print(mapper.fit_transform(data.copy()))
In both the uncommented and the commented case there is no salary column in the transformed dataframe. I would have expected that unmentioned columns are not touched, especially since the drop_cols option exists.
Is this just me having arbitrary expectations or is there something strange going on?
I have modified the _build(self, X=None):
function inside DataFrameMapper
class and added code to filter the columns based on self.drop_cols
variable.
Previous build function:
def _build(self, X=None):
"""
Build attributes built_features and built_default.
"""
if isinstance(self.features, list):
self.built_features = [
_build_feature(*f, X=X) for f in self.features
]
else:
self.built_features = _build_feature(*self.features, X=X)
self.built_default = _build_transformer(self.default)
Modified code:
def _build(self, X=None):
"""
Build attributes built_features and built_default.
"""
if isinstance(self.features, list):
filtered_list = []
for obj in self.features:
if isinstance(obj[0], list):
new_cols = [col for col in obj[0] if col not in self.drop_cols]
new_tuple = tuple([new_cols] + list(obj[1:]))
filtered_list.append(new_tuple)
else:
if obj[0] not in self.drop_cols:
filtered_list.append(obj)
self.features = filtered_list
self.built_features = [_build_feature(*f, X=X) for f in self.features]
else:
self.built_features = _build_feature(*self.features, X=X)
self.built_default = _build_transformer(self.default)
Any feedback or suggestions on my code changes would be greatly appreciated. Thank you!
你好,已收到,谢谢。