Metrics API

Open LinasKo opened this issue 1 year ago • 0 comments

This issue aggregates the discussion and near-future plans to introduce metrics to supervision.

The first steps shall be enacted by the core Roboflow team, and then we'll open submissions for specific metrics for the community.

I propose the following:

Aim for ease of usage, compact API, sacrificing completeness if required.
Provide public classes with aggregation by default (metrics.py), keep implementation in impl.py or equivalent, to be used internally.
Expose not in global scope, but in supervision.metrics.
I don't think we need to split into metrics.detection, metrics.segmentation, metrics.classification, but I'm on the fence.
Focus only on what we can apply to Detections object.
This means, only implement metrics if they use some of: class_id, confidence, xyxy, mask, xyxyxyxy (in Detections.data).

:warning: I don't know:

How metrics are computed when targets and predictions have different numbers of detections or they are mismatched.
I don't think metrics should fail in that case, but perhaps there's a standard way of addressing this.

I believe we could start with:

Importing current metrics into the new system:
- IoU
- mAP
- Confusion Matrix
Detections
- Accuracy
- Precision
- Recall
General
- Mean confidence
- Median confidence
- Min confidence
- Max confidence
- (not typical, but I'd find useful) - number of unique classes detected & aggregate count of how many objects of which class were detected - N defects / hour).

I believe the param Metrics needs to provide during construction is queue_size.

1 - don't keep history, only ever give metrics of current batch
N - keep up to N metric results in history for computation.

Other thoughts:

I don't think metrics should know about datasets. Instead of benchmark as it is in current API, let's have def benchmark_dataset(dataset, metric) in metrics/utils.py.

API:

class Accuracy(Metric):
    def __init__(self, queue_size=1) -> None
    
    @override
    def update(predictions: Detections, targets: Detections) -> None
    
    @override
    def compute() -> NotSureYet

    # Metric also provides  `def detect_and_compute(args*, kwargs**)`.

accuracy_metric = Accuracy()
accuracy_metric.add(detections, detections_ground_truth)
accuracy = accuracy_metric.compute()

Related features:

https://github.com/roboflow/supervision/issues/140
https://github.com/roboflow/supervision/pull/177
https://github.com/roboflow/supervision/issues/232
https://github.com/roboflow/supervision/pull/236
https://github.com/roboflow/supervision/issues/292
https://github.com/roboflow/supervision/issues/480
https://github.com/roboflow/supervision/issues/632

Jul 16 '24 12:07 LinasKo