amazon-sagemaker-clarify Multi-categorical confusion matrix calculation for labels not presented in predicted

Multi-categorical confusion matrix calculation for labels not presented in predicted_labels

Open xiaoyi-cheng opened this issue 2 years ago • 3 comments

Feedback from Bilal from a PR review: https://github.com/aws/amazon-sagemaker-clarify/pull/136#discussion_r1124552393

How are we supposed to handle cases when a predicted label (in this case 2) is not present in the observed labels (in this case [1])? Some options are:

We limit the confusion matrix CM to labels are present in both observed (label_series) and predicted labels (predicted_label_series). This is what sklearn does.
CM contains labels from the union of observed and predicted labels.
CM contains labels from observed labels only. If a predicted label is not found in observed labels, we raise an error saying something like "Unknown label 2".
I think we should pick option 3 since it assumes that observed labels provide us a complete list of all the possible labels. Option 1 could be problematic because it will drop some valid observed labels in case they are not found in predicted labels.

If we opt for 3, we should raise an error in this line.

Need to figure out if we want to handle this from analyzer side or library.

Mar 04 '23 01:03 xiaoyi-cheng

CC @bilalaws

Mar 06 '23 21:03 goswamig

We need to dive deep in container and see if there are better options than mentioned above.

Mar 06 '23 22:03 goswamig

@bilalaws do we have data on how often this use case will be hit by customer ? when a predicted label (in this case 2) is not present in the observed labels (in this case [1])? ?

Mar 06 '23 22:03 goswamig

amazon-sagemaker-clarify amazon-sagemaker-clarify copied to clipboard

Multi-categorical confusion matrix calculation for labels not presented in predicted_labels

amazon-sagemaker-clarify
amazon-sagemaker-clarify copied to clipboard