supervision icon indicating copy to clipboard operation
supervision copied to clipboard

Correct confusion matrix calculation-function evaluate_detection_batch

Open panagiotamoraiti opened this issue 5 months ago • 5 comments
trafficstars

Description

This fixes the issue where predicted bounding boxes were matched to ground truth boxes solely based on IoU, without considering class agreement during the matching process. Currently, if a predicted box has a higher IoU but the wrong class, it gets matched first, and the correct prediction with the right class but lower IoU is discarded. This leads to miscounting true positives and false positives, resulting in inaccurate confusion matrix.

The change modifies the matching logic (method evaluate_detection_batch) to incorporate both IoU and class agreement simultaneously, ensuring only predictions that match both IoU threshold and class are matched to ground truths. This results in a correct confusion matrix.

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

I had an image with 2 TP and 1 FP detections, but the confusion matrix predicted 1 TP, 2 FP and 1FN. The FP bbox with the wrong class had higher overlap so the TP was discarded. At the end also this bbox was discareded due to the wrong class id. Now my confusion matrix predicts correctly 2 TP and 1 FP detections.

I run this in a big dataset, another script i have developed and used extensively in previous project gives the following results that now match with the confusion matrix, before i corrected them they didn't match.

Test Set: Ground Truth Objects: 481 True Positives: 469 False Positives: 11 False Negatives: 12

Validation Set: Ground Truth Objects: 1073 True Positives: 1037 False Positives: 23 False Negatives: 36

Train Set: Ground Truth Objects: 3716 True Positives: 3674 False Positives: 52 False Negatives: 42

panagiotamoraiti avatar May 27 '25 15:05 panagiotamoraiti

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 27 '25 15:05 CLAassistant

For evaluating the confusion matrix you can use the following code:

import numpy as np
import supervision as sv

# Define class names
class_names = ['cat', 'dog', 'rabbit']

# Ground truth detections (3 objects, one per class)
gt = sv.Detections(
    xyxy=np.array([
        [0, 0, 2, 2],   # cat
        [3, 3, 5, 5],   # dog
        [6, 6, 8, 8],   # rabbit
        [6, 15, 9, 16], # rabbit
        [2, 2, 3, 3],   # rabbit
    ]),
    class_id=np.array([0, 1, 2, 2, 2])
)

# Predicted detections (6 predictions)
preds = sv.Detections(
    xyxy=np.array([
        [0, 0, 2, 2],
        [3, 3, 5, 5], 
        [6, 6, 8, 8], 
        [9, 9, 11, 11], # FP 
        [10, 10, 12, 12], # FP
        [2, 2, 3, 3],  # confused rabbit as cat
    ]),
    class_id=np.array([0, 1, 2, 0, 1, 1]),  # note: rabbit GT predicted as cat (confused)
    confidence=np.array([0.9, 0.7, 0.8, 0.6, 0.7, 0.7])
)

# Generate confusion matrix
cm = sv.ConfusionMatrix.from_detections(
    predictions=[preds],
    targets=[gt],
    classes=class_names,
    conf_threshold=0.5,
    iou_threshold=0.5
)

print("Confusion Matrix:\n", cm.matrix)

I've confirmed that it works with many examples.

panagiotamoraiti avatar Jun 06 '25 15:06 panagiotamoraiti

Excellent work on fixing this critical confusion matrix bug! The issue you've identified—where IoU-based matching without class agreement leads to incorrect metrics—is fundamental to proper evaluation. Your solution with class-prioritized global matching is elegant and mathematically sound.

Technical Assessment:

  1. Algorithmic Correctness: The global sorting approach (class_match, IoU) is optimal. It ensures class-correct matches are prioritized while maintaining IoU-based ranking within each class group.

  2. Order Independence: The move from sequential GT processing to global match collection eliminates the order dependency issue correctly identified by @soumik12345.

  3. Hungarian Assignment Alternative: Your greedy approach is computationally efficient and appropriate for detection evaluation. While Hungarian assignment would be optimal for bipartite matching, the greedy method with proper sorting achieves the same practical results for most real-world scenarios.

Implementation Strengths:

  • Comprehensive test cases covering edge cases and boundary conditions
  • Clear separation of TP/FP/FN logic
  • Maintains COCO evaluation protocol compliance
  • Performance-efficient with O(n*m) complexity

Minor Suggestions:

  • Consider adding validation for empty detection/GT cases in the main function
  • The extensive test suite should definitely be integrated into the main test file as suggested

This fix will significantly improve evaluation accuracy across the computer vision community. The mathematical rigor and extensive testing demonstrate excellent software engineering practices.

Best regards, Gabriel

galafis avatar Sep 27 '25 14:09 galafis

@galafis Thank you for the detailed description and recognition of my contribution. I will definitely integrate your valuable suggestions soon.

@SkalskiP @onuralpszr @soumik12345 I will be glad if you could take a look and assess my fix. I think that this can help many people in computer vision community.

panagiotamoraiti avatar Sep 27 '25 16:09 panagiotamoraiti

@soumik12345 I added the testcases into the main test file. Sorry for the delay but i had completely forgotten for some reason this suggestion. Thanks to @galafis that reminded me this.

@galafis Regarding your second suggestion (validation for empty detection/GT cases). I don't think that this is essential because if detections are empty in confusion matrix we'll have FPs, if ground truths are empty we'll have FNs and if both are empty w'll get a confusion matrix with zeros. I think no error should occur. I added 3 tescases addressing these 3 cases.

panagiotamoraiti avatar Oct 03 '25 10:10 panagiotamoraiti