sklearn-pandas icon indicating copy to clipboard operation
sklearn-pandas copied to clipboard

DataFrameMapper - pass custom callable function and preserve columns names

Open roei-simplex opened this issue 3 years ago • 0 comments

As mentioned in Dynamic Columns section of the documentation, DataFrameMapper supports selecting columns dynamically during the fit operation by passing a custom callable or using sklearn.compose.make_column_selector. I've tried doing so, and the behavior I've experienced is that the columns names are replaced with column index (a number), which is also the expected behavior according to the documentation:

class GetColumnsStartingWith: ... def init(self, start_str): ... self.pattern = start_str ... ... def call(self, X:pd.DataFrame=None): ... return [c for c in X.columns if c.startswith(self.pattern)] ... df = pd.DataFrame({ ... 'sepal length (cm)': [1.0, 2.0, 3.0], ... 'sepal width (cm)': [1.0, 2.0, 3.0], ... 'petal length (cm)': [1.0, 2.0, 3.0], ... 'petal width (cm)': [1.0, 2.0, 3.0] ... }) t = DataFrameMapper([ ... ( ... sklearn.compose.make_column_selector(dtype_include=float), ... sklearn.preprocessing.StandardScaler(), ... {'alias': 'x'} ... ), ... ( ... GetColumnsStartingWith('petal'), ... None, ... {'alias': 'petal'} ... )], df_out=True, default=False) t.fit(df).transform(df).shape (3, 6) t.transformed_names_ ['x_0', 'x_1', 'x_2', 'x_3', 'petal_0', 'petal_1']

I would like to know how can I select columns dynamically (e.g. by dtype) while preserving their names.

roei-simplex avatar Feb 08 '22 16:02 roei-simplex