evalml
evalml copied to clipboard
Imputer modifies user data when user passes in a DataTable
import woodwork as ww
import pandas as pd
import numpy as np
from evalml.pipelines.components import Imputer
df = ww.DataTable(pd.DataFrame({
"all nan": [np.nan, np.nan, np.nan, np.nan, np.nan],
"all nan cat": pd.Series([np.nan, np.nan, np.nan, np.nan, np.nan], dtype='category')
}))
X = Imputer().fit_transform(df)
assert df.to_dataframe().empty
This came up during #2018 . The imputer is expected to drop all null columns but, as a user, I wouldn't expect the Imputer
to modify the data pass in.
The underlying issue is that infer_feature_types does not copy the data when users pass in a data table.
I would caution about copy the user's data. If the user has a large data set, the copying might be expensive.
For Featuretools, we modify the user dataframe when inputted to an Entity.
Let's do the copy for now. I agree this has performance implications, but its important to keep our API contract clear.
This only happens when one or more cols is fully-nan, so let's treat it as low priority.
Since https://github.com/alteryx/evalml/issues/2751 was merged in, can we close out this issue?