amazon-sagemaker-clarify
amazon-sagemaker-clarify copied to clipboard
Multi-categorical confusion matrix calculation for labels not presented in predicted_labels
Feedback from Bilal from a PR review: https://github.com/aws/amazon-sagemaker-clarify/pull/136#discussion_r1124552393
How are we supposed to handle cases when a predicted label (in this case 2) is not present in the observed labels (in this case [1])? Some options are:
We limit the confusion matrix CM to labels are present in both observed (label_series) and predicted labels (predicted_label_series). This is what sklearn does.
CM contains labels from the union of observed and predicted labels.
CM contains labels from observed labels only. If a predicted label is not found in observed labels, we raise an error saying something like "Unknown label 2".
I think we should pick option 3 since it assumes that observed labels provide us a complete list of all the possible labels. Option 1 could be problematic because it will drop some valid observed labels in case they are not found in predicted labels.
If we opt for 3, we should raise an error in this line.
Need to figure out if we want to handle this from analyzer side or library.
CC @bilalaws
We need to dive deep in container and see if there are better options than mentioned above.
@bilalaws do we have data on how often this use case will be hit by customer ?
when a predicted label (in this case 2) is not present in the observed labels (in this case [1])? ?