evalml icon indicating copy to clipboard operation
evalml copied to clipboard

TimeSeriesImputer should not allow interpolate as strategy for boolean or categorical targets

Open tamargrey opened this issue 1 year ago • 0 comments

Currently the target_impute_strategy is applied to any kind of target data, independent of whether or not the strategy makes sense for that kind of data. This is only problematic for the interpolate strategy, as the other two can be used with any data.

Interpolate, however, should only be used with numeric values. Data with the category dtype will raise an error from pandas, and data with boolean values, with the nullalble type handling, will become Double with floating point values imputed, which doesn't make sense (this was actually happening prior to the nulalble type handling as well).

We should consider either not allowing interpolate (in which case we could remeove y from the _integer_nullable_incompatibilities) to be used for non numeric data or using one other other interpolate methods listed https://pandas.pydata.org/docs/reference/api/pandas.Series.interpolate.html.

This will not be seen in AutoML search, because we use the default impute strategies in _make_component_list_from_actions, so interpolate will not be the target_strategy.

tamargrey avatar Mar 06 '23 22:03 tamargrey