evaluate
evaluate copied to clipboard
Add COCO evaluation metrics
I'm currently working on adding Facebook AI's DETR model (end-to-end object detection with Transformers) to HuggingFace Transformers. The model is working fine, but regarding evaluation, I'm currently relying on external CocoEvaluator
and PanopticEvaluator
objects which are defined in the original repository (here and here respectively).
Running these in a notebook gives you nice summaries like this:
It would be great if we could import these metrics from the Datasets library, something like this:
import datasets
metric = datasets.load_metric('coco')
for model_input, gold_references in evaluation_dataset:
model_predictions = model(model_inputs)
metric.add_batch(predictions=model_predictions, references=gold_references)
final_score = metric.compute()
I think this would be great for object detection and semantic/panoptic segmentation in general, not just for DETR. Reproducing results of object detection papers would be way easier.
However, object detection and panoptic segmentation evaluation is a bit more complex than accuracy (it's more like a summary of metrics at different thresholds rather than a single one). I'm not sure how to proceed here, but happy to help making this possible.
Hi @NielsRogge,
I'd like to contribute these metrics to datasets. Let's start with CocoEvaluator
first? Currently how are are you sending the ground truths and predictions in coco_evaluator?
Great!
Here's a notebook that illustrates how I'm using CocoEvaluator
: https://drive.google.com/file/d/1VV92IlaUiuPOORXULIuAdtNbBWCTCnaj/view?usp=sharing
The evaluation is near the end of the notebook.
I went through the code you've mentioned and I think there are 2 options on how we can go ahead:
- Implement how DETR people have done this (they're relying very heavily on the official implementation and they're focussing on torch dataset here. I feel ours should be something generic instead of pytorch specific.
- Do this implementation where user can convert its output and ground truth annotation to pre-defined format and then feed it into our function to calculate metrics (looks very similar to you wanted above)
In my opinion, 2nd option looks very clean but I'm still figuring out how's it transforming the box co-ordinates of coco_gt
which you've passed to CocoEvaluator
(ground truth for evaluation). Since your model output was already converted to COCO api, I faced little problems there.
Ok, thanks for the update.
Indeed, the metrics API of Datasets is framework agnostic, so we can't rely on a PyTorch-only implementation.
This file is probably want we need to implement.
Hi @lvwerra
Do you plan to add a 3rd party application for the COCO map metric?
Is there any update on this? What would be the recommended way of doing COCO eval with Huggingface?
Yes there's an update on this. @rafaelpadilla has been working on adding native support for COCO metrics in the evaluate library, check the Space here: https://huggingface.co/spaces/rafaelpadilla/detection_metrics. For now you have to load the metric as follows:
import evaluate
evaluator = evaluate.load("rafaelpadilla/detection_metrics", json_gt=ground_truth_annotations, iou_type="bbox")
but this one is going to be integrated in the main evaluate
library.
This is then leveraged to create the open object detection leaderboard: https://huggingface.co/spaces/rafaelpadilla/object_detection_leaderboard.
Yep, we intend to integrate to evaluate
library.
Meanwhile you can use from here https://huggingface.co/spaces/rafaelpadilla/detection_metrics
Update: the code with the evaluate
AP metric and its variations was transferred to https://huggingface.co/spaces/hf-vision/detection_metrics
Hi, running
import evaluate
evaluator = evaluate.load("hf-vision/detection_metrics", json_gt=ground_truth_annotations, iou_type="bbox")
results in the following error:
ImportError: To be able to use hf-vision/detection_metrics, you need to install the following dependencies['detection_metrics'] using 'pip install detection_metrics' for instance'
How do I load the metric from the hub? Do I need to download the content of that repository manually first?
I'm running evaluate==0.4.1
.
Ran into the same issue @maltelorbach posted on 12/14/2023