evalml icon indicating copy to clipboard operation
evalml copied to clipboard

TargetDistributionDataCheck: Add support for nullable logical types - scipy nullable type incompatibilities

Open tamargrey opened this issue 1 year ago • 0 comments

Currently, the TargetDistributionDataCheck does not allow nullable logical types. This doesn't match the behavior of InvalidTargetDataCheck, which does allow nullable types. With the new nullable type support across automl search, we should update TargetDistributionDataCheck to allow numeric nullable types, AgeNullable and IntegerNullable.

We are currently blocked from doing so because incompatibilities the scipy.stats utils jarque_bera and shapiro have with nullable types that contain null values.

    from scipy.stats import jarque_bera, shapiro


    for dtype in ["Int64", "boolean"]:
        for scipy_method in [jarque_bera, shapiro]:
            # Works if null value isn't present
            y = pd.Series([1,0]* 50 , dtype=dtype)
            scipy_method(y)

            # Breaks if null value is present
            y.iloc[-1] = pd.NA
            with pytest.raises(TypeError, match="value of NA is ambiguous"):
                scipy_method(y)

This is not reachable from the AutoMLSearch class directly, but is reachable if you call the search or search_iterative utilities since they use DefaultDataChecks, which contain InvalidTargetDataCheck.

    import woodwork as ww
    from evalml.automl import search


    X, y = X_y_regression
    y = ww.init_series(pd.Series(range(len(y))), logical_type="IntegerNullable")
    _, data_check_results = search(
        X_train=X,
        y_train=y,
        problem_type="regression",
        max_time=42,
        patience=3,
        tolerance=0.5,
        mode="fast",
    )
    assert  data_check_results[0]["message"] == 'Target is unsupported integer_nullable type. Valid Woodwork logical types include: integer, double, age, age_fractional'

We should handle this incompatibility and then allow the nullable numeric types in this data check.

tamargrey avatar Mar 20 '23 16:03 tamargrey