supervision
supervision copied to clipboard
Metrics API
This issue aggregates the discussion and near-future plans to introduce metrics to supervision.
The first steps shall be enacted by the core Roboflow team, and then we'll open submissions for specific metrics for the community.
I propose the following:
- Aim for ease of usage, compact API, sacrificing completeness if required.
- Provide public classes with aggregation by default (metrics.py), keep implementation in impl.py or equivalent, to be used internally.
- Expose not in global scope, but in supervision.metrics.
- I don't think we need to split into metrics.detection, metrics.segmentation, metrics.classification, but I'm on the fence.
- Focus only on what we can apply to Detections object.
- This means, only implement metrics if they use some of: class_id, confidence, xyxy, mask, xyxyxyxy (in Detections.data).
:warning: I don't know:
- How metrics are computed when targets and predictions have different numbers of detections or they are mismatched.
- I don't think metrics should fail in that case, but perhaps there's a standard way of addressing this.
I believe we could start with:
- Importing current metrics into the new system:
- IoU
- mAP
- Confusion Matrix
- Detections
- Accuracy
- Precision
- Recall
- General
- Mean confidence
- Median confidence
- Min confidence
- Max confidence
- (not typical, but I'd find useful) - number of unique classes detected & aggregate count of how many objects of which class were detected - N defects / hour).
I believe the param Metrics needs to provide during construction is queue_size.
- 1 - don't keep history, only ever give metrics of current batch
- N - keep up to N metric results in history for computation.
Other thoughts:
- I don't think metrics should know about datasets. Instead of benchmark as it is in current API, let's have def benchmark_dataset(dataset, metric) in metrics/utils.py.
API:
class Accuracy(Metric):
def __init__(self, queue_size=1) -> None
@override
def update(predictions: Detections, targets: Detections) -> None
@override
def compute() -> NotSureYet
# Metric also provides `def detect_and_compute(args*, kwargs**)`.
accuracy_metric = Accuracy()
accuracy_metric.add(detections, detections_ground_truth)
accuracy = accuracy_metric.compute()
Related features:
- https://github.com/roboflow/supervision/issues/140
- https://github.com/roboflow/supervision/pull/177
- https://github.com/roboflow/supervision/issues/232
- https://github.com/roboflow/supervision/pull/236
- https://github.com/roboflow/supervision/issues/292
- https://github.com/roboflow/supervision/issues/480
- https://github.com/roboflow/supervision/issues/632