sklearn-pandas
sklearn-pandas copied to clipboard
bug fixed: Unexpected Dropping of columns
The bug was indicated in the Unexpected Dropping of columns issue where the user there was no effect of passing the drop_cols
argument in the DataFrameMapper
and the output columns were also wrong.
I have modified the _build(self, X=None):
function inside DataFrameMapper
class and added code to filter the columns based on self.drop_cols
variable.
Previous build function:
def _build(self, X=None):
"""
Build attributes built_features and built_default.
"""
if isinstance(self.features, list):
self.built_features = [
_build_feature(*f, X=X) for f in self.features
]
else:
self.built_features = _build_feature(*self.features, X=X)
self.built_default = _build_transformer(self.default)
The modified function:
def _build(self, X=None):
"""
Build attributes built_features and built_default.
"""
if isinstance(self.features, list):
filtered_list = []
for obj in self.features:
if isinstance(obj[0], list):
new_cols = [col for col in obj[0] if col not in self.drop_cols]
new_tuple = tuple([new_cols] + list(obj[1:]))
filtered_list.append(new_tuple)
else:
if obj[0] not in self.drop_cols:
filtered_list.append(obj)
self.features = filtered_list
self.built_features = [_build_feature(*f, X=X) for f in self.features]
else:
self.built_features = _build_feature(*self.features, X=X)
self.built_default = _build_transformer(self.default)
This will filter the columns based on the self.drop_cols
variable and will get the filtered columns. I am a beginner in open source contribution and this is my first pull request. Please feel free to give me any suggestions.