evalml
evalml copied to clipboard
`PerColumnImputer` can raise `woodwork.exceptions.TypeConversionError` if float values are imputed into `Int64` data
The PerColumnImputer
can impute floating point values into integer data with the mean
or median
numeric impute strategies. When this happens, we cannot simply reinitialize the original data's woodwork schema via X_t.ww.init(schema=original_schema.get_subset_schema(X_t.columns))
like we currently do, since it would try to use Int64
on floating point data, which results in an error.
We'll need to use _get_new_logical_types_for_imputed_data
similar to how other imputers do in order to use the correct logical types for imputed data. Note that because the per-column imputer can have different strategies for different columns, we'll need to either change _get_new_logical_types_for_imputed_data
to allow per column strategies, or call it individually for every column.
below is a test that produces the type conversion error
def test_per_column_imputer_float_imputed_into_int(imputer_test_data):
X = imputer_test_data.ww[["int with nan"]]
strategies = {
"int with nan": {"impute_strategy": "mean"},
}
transformer = PerColumnImputer(impute_strategies=strategies)
transformer.fit(X)
transformer.transform(X)