evalml
evalml copied to clipboard
ARIMARegressor should fill boolean nans with the mode then convert to double to fully support boolean values
- As a user, I wish I could pass any boolean column into the ARIMARegressor component. Currently, if you pass in a column with the
Boolean
logical type, we convert it toDouble
prior to passing into the estimator. But we do nothing withBooleanNullable
columns, which will cause them to raise an error that AutoARIMA can't handle boolean values.
We should add support for all boolean values. Note - this is NOT a nullable type incompatibility. Sktime's AutoARIMA regressor is not meant to take in any boolean value -- nullable type or not--and we do not expect them to add support for that. It is our choice in evalml to convert booleans to doubles to pass into ARIMA, so we should support all boolean types.
We'll need to handle potential null values in the same way we handle null values in numeric columns but use the mode instead of the mean. This means we should fillna
with the mode of the boolean nullable columns, then convert to Double.
I would argue this is a nullable type incompatibility. Any null values that enter pipelines with an ARIMARegressor should be imputed before reaching the estimator, and we don't expect our estimators to be able to handle truly null values intrinsically. Therefore, if a BooleanNullable column reached the ARIMARegressor we would expect it to be nullable but have no actual nan values, meaning we should handle it the same way we handle Boolean.
@eccabay I want to make sure we're separating out the actual handling of null values from the concept of nullable type incompatibilities. Many of our components can't handle nans but can take in Int65
or boolean
pandas dtypes, and we don't consider them to have any nullable type incompatibilities. I wouldn't hate the chance to get to standardize how individual components handle nans, but that would've widened the scope of the nullable type work beyond what we need.
Therefore, if a BooleanNullable column reached the ARIMARegressor we would expect it to be nullable but have no actual nan values, meaning we should handle it the same way we handle Boolean.
I 100% agree with this and, to me, this is why it's not a nullable type incompatibility, because, if we ignore the fact that nans might be present (which we generally do for estimators; this one just happens to have some nan imputing in place for numeric types), we can treat them exactly the same by converting to Double
. AutoARIMA has a compatibility issue with any boolean type, and we're not expecting that to change.
Compare that to IntegerNullable, where we cannot handle Integer
and IntegerNullable
columns the same at the moment because of the incompatibility with the IntegerNullable
type. Once sktime adds support, though, we'll treat Integer and IntegerNullable exactly the same. The fact that we impute nans in ARIMARegressor
is entirely separate from whether the nullable integer type was incompatible.
When this gets implemented, we should add BooleanNullable
to the parametrization in test_arima_regressor_with_nullable_types