supervision
supervision copied to clipboard
Add BenchmarkEvaluator with basic precision/recall computation
Summary
This PR introduces a utility class BenchmarkEvaluator in supervision/metrics/benchmark.py to support benchmarking object detection results across different datasets or models.
Features
- Computes basic precision and recall
- Accepts
Detectionsobjects for ground truth and prediction - Optional support for class mapping and IoU thresholding (future extensions)
- Includes a unit test at
tests/metrics/test_benchmark.py
Motivation
Addresses Issue #1778: Improving object detection benchmarking process for unrelated datasets.
Let me know if you'd like me to extend this in future PRs with:
- mAP, F1, or per-class metrics
- Confusion matrix visualization
- Colab notebook example
Thanks for the opportunity to contribute!
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
Muhammed Swalihu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.
Hi @SkalskiP @onuralpszr — I've submitted this PR for the BenchmarkEvaluator (Issue #1778 ). Let me know if you'd like me to fix the pre-commit error or extend this further. Thanks for reviewing!
Hi @Muhammedswalihu, this seems like a really valuable feature! Can you please replace the placeholder logic with a working one, provide a working example and testcases; and we can review the PR.
Hi @soumik12345 , thanks for the review!
I’ll go ahead and:
Replace the placeholder logic in BenchmarkEvaluator with full precision/recall/mAP computation,
Add a working demo example (maybe in a Colab notebook for clarity), and
Improve the test coverage with more edge cases and per-class evaluation.
Let me know if there’s anything specific you’d like to see included. Appreciate the opportunity — excited to take this further!
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Hi @soumik12345, I've added a Colab-style demo notebook BenchmarkEvaluator_Demo.ipynb!
It includes:
-
How to import and use the BenchmarkEvaluator
-
Per-class precision and recall visualization
-
A visual example comparing predicted and ground truth bounding boxes
This should help users understand and adopt the module more easily.
Let me know if you'd like me to polish or extend this notebook further!
Great initiative on the BenchmarkEvaluator! This addresses a crucial need for standardized evaluation metrics. I'd like to offer some technical guidance to help you complete the implementation effectively.
Key Implementation Recommendations:
-
IoU-based Matching Algorithm: For proper TP/FP/FN computation, you'll need Hungarian assignment or greedy matching based on IoU thresholds:
def compute_matches(pred_boxes, gt_boxes, iou_threshold=0.5): # Compute IoU matrix # Apply optimal assignment (e.g., scipy.optimize.linear_sum_assignment) # Return matched pairs, unmatched predictions (FP), unmatched ground truth (FN) -
Multi-class Support: Consider class-aware matching for per-class metrics:
- Group detections by class_id
- Compute metrics separately for each class
- Aggregate for overall performance
-
Confidence Thresholding: Implement confidence-based filtering for realistic evaluation scenarios
-
Standard Metrics: Beyond precision/recall, consider adding:
- F1-score
- Average Precision (AP) at different IoU thresholds
- Mean Average Precision (mAP)
Performance Considerations:
- Vectorized IoU computation using numpy/supervision utilities
- Batch processing for large evaluation sets
- Memory-efficient handling of detection arrays
This evaluator will be invaluable for the community's benchmarking needs. Happy to provide more specific implementation details if needed!
Best regards, Gabriel