evalml icon indicating copy to clipboard operation
evalml copied to clipboard

`PerColumnImputer` can raise `woodwork.exceptions.TypeConversionError` if float values are imputed into `Int64` data

Open tamargrey opened this issue 1 year ago • 0 comments

The PerColumnImputer can impute floating point values into integer data with the mean or median numeric impute strategies. When this happens, we cannot simply reinitialize the original data's woodwork schema via X_t.ww.init(schema=original_schema.get_subset_schema(X_t.columns)) like we currently do, since it would try to use Int64 on floating point data, which results in an error.

We'll need to use _get_new_logical_types_for_imputed_data similar to how other imputers do in order to use the correct logical types for imputed data. Note that because the per-column imputer can have different strategies for different columns, we'll need to either change _get_new_logical_types_for_imputed_data to allow per column strategies, or call it individually for every column.

below is a test that produces the type conversion error

def test_per_column_imputer_float_imputed_into_int(imputer_test_data):
    X = imputer_test_data.ww[["int with nan"]]
    strategies = {
        "int with nan": {"impute_strategy": "mean"},
    }
    transformer = PerColumnImputer(impute_strategies=strategies)
    transformer.fit(X)
    transformer.transform(X)

tamargrey avatar Apr 14 '23 15:04 tamargrey