evalml icon indicating copy to clipboard operation
evalml copied to clipboard

precision_recall_curve can raise `ValueError: unknown format is not supported` when nullable types are used

Open tamargrey opened this issue 2 years ago • 0 comments

The following block of code will work for pos_label_idx equal to 1 but raise ValueError: unknown format is not supported when pos_label_idx is 0:

    import woodwork as ww
    from evalml.model_understanding.metrics import  precision_recall_curve

    y_true = pd.Series(np.array([0, 0, 1, 1]))
    y_true = ww.init_series(y_true, logical_type="IntegerNullable")
    y_predict_proba = pd.DataFrame(
        np.array([[0.9, 0.1], [0.6, 0.4], [0.65, 0.35], [0.2, 0.8]]),
    )
    # Works
    precision_recall_curve_data = precision_recall_curve(
        y_true,
        y_predict_proba,
        pos_label_idx=1,
    )

   # Broken
    precision_recall_curve_data = precision_recall_curve(
        y_true,
        y_predict_proba,
        pos_label_idx=0,
    )

We should look into why this is happening for pos_label_idx=0 but not pos_label_idx=1 and add support for nullable types. We should consider using _convert_ww_series_to_np_array as we do in confusion_matrix to support the nullable type conversion to numpy.

Note - this seems to be related to #3910 , as both have to do with a checking of the type of the target via sklearn's type_of_target util returning unknown for data that uses nullable pandas dtypes.

tamargrey avatar Jan 12 '23 18:01 tamargrey