evalml
evalml copied to clipboard
precision_recall_curve can raise `ValueError: unknown format is not supported` when nullable types are used
The following block of code will work for pos_label_idx
equal to 1 but raise ValueError: unknown format is not supported
when pos_label_idx
is 0:
import woodwork as ww
from evalml.model_understanding.metrics import precision_recall_curve
y_true = pd.Series(np.array([0, 0, 1, 1]))
y_true = ww.init_series(y_true, logical_type="IntegerNullable")
y_predict_proba = pd.DataFrame(
np.array([[0.9, 0.1], [0.6, 0.4], [0.65, 0.35], [0.2, 0.8]]),
)
# Works
precision_recall_curve_data = precision_recall_curve(
y_true,
y_predict_proba,
pos_label_idx=1,
)
# Broken
precision_recall_curve_data = precision_recall_curve(
y_true,
y_predict_proba,
pos_label_idx=0,
)
We should look into why this is happening for pos_label_idx=0
but not pos_label_idx=1
and add support for nullable types. We should consider using _convert_ww_series_to_np_array
as we do in confusion_matrix
to support the nullable type conversion to numpy.
Note - this seems to be related to #3910 , as both have to do with a checking of the type of the target via sklearn's type_of_target
util returning unknown
for data that uses nullable pandas dtypes.