sklearn-pandas DataFrameMapper changes columns types when default=None.

DataFrameMapper changes columns types when default=None.

Open leonardommarques opened this issue 6 years ago • 4 comments

When I use DataFrameMapper and set up default=None to transform a column, all other columns types are changed to object. But this does not happen when I have only float and/or int columns

import pandas as pd
import numpy as np
from sklearn_pandas import DataFrameMapper
from sklearn.impute import SimpleImputer


# all numerical columns lead to no error
da = pd.DataFrame({
    'a':[1,3,np.nan],
    'b': [1.2,2,3]})
print(da.dtypes)

aux_imp = DataFrameMapper([
    (['a'], SimpleImputer(strategy='mean'))], 
    df_out=True, default=None)

da = aux_imp.fit_transform(da)
print(da.dtypes)

# if a column is of str it leads to errors
da = pd.DataFrame({
    'a':[1,3,np.nan],
    'b': [1.2,2,3],
    'c':['c', 'c', 'a']
})
print(da.dtypes)

aux_imp = DataFrameMapper(
    [(['a'], SimpleImputer(strategy='mean'))], 
    df_out=True, default=None)

da = aux_imp.fit_transform(da)
print(da.dtypes)

Sep 11 '18 02:09 leonardommarques

I believe this is because the dataframe mapper uses the same "empty transformer" selecting all not explicitly selected columns, therefore if their types are mixed, the best type for the extracted numpy array is "object", to be able to cover strings, ints, floats, etc.

I don't know if this can be worked around by "copying" the default columns one by one, keeping the dtype.

Oct 17 '18 18:10 dukebody

Hi, I'm new to open source contribution. Is is okay for me to work on this issue?

Apr 02 '19 11:04 monda00

Hello, I would like to work on it Can you please assign it to me

Jan 12 '21 04:01 pradumna123

is this issue resolved ? I am facing the same issue, I have a DataFrame containing columns of float and str dtypes. using default=None converts the dtype of all the columns to object which is causing my Pipelines to fail.

Sep 08 '21 13:09 ajayverma90

sklearn-pandas sklearn-pandas copied to clipboard

DataFrameMapper changes columns types when default=None.

sklearn-pandas
sklearn-pandas copied to clipboard