evaluate icon indicating copy to clipboard operation
evaluate copied to clipboard

Add COCO evaluation metrics

Open NielsRogge opened this issue 3 years ago • 12 comments

I'm currently working on adding Facebook AI's DETR model (end-to-end object detection with Transformers) to HuggingFace Transformers. The model is working fine, but regarding evaluation, I'm currently relying on external CocoEvaluator and PanopticEvaluator objects which are defined in the original repository (here and here respectively).

Running these in a notebook gives you nice summaries like this: image

It would be great if we could import these metrics from the Datasets library, something like this:

import datasets

metric = datasets.load_metric('coco')

for model_input, gold_references in evaluation_dataset:
    model_predictions = model(model_inputs)
    metric.add_batch(predictions=model_predictions, references=gold_references)

final_score = metric.compute()

I think this would be great for object detection and semantic/panoptic segmentation in general, not just for DETR. Reproducing results of object detection papers would be way easier.

However, object detection and panoptic segmentation evaluation is a bit more complex than accuracy (it's more like a summary of metrics at different thresholds rather than a single one). I'm not sure how to proceed here, but happy to help making this possible.

NielsRogge avatar May 03 '21 13:05 NielsRogge

Hi @NielsRogge, I'd like to contribute these metrics to datasets. Let's start with CocoEvaluator first? Currently how are are you sending the ground truths and predictions in coco_evaluator?

bhavitvyamalik avatar Jun 02 '21 09:06 bhavitvyamalik

Great!

Here's a notebook that illustrates how I'm using CocoEvaluator: https://drive.google.com/file/d/1VV92IlaUiuPOORXULIuAdtNbBWCTCnaj/view?usp=sharing

The evaluation is near the end of the notebook.

NielsRogge avatar Jun 02 '21 11:06 NielsRogge

I went through the code you've mentioned and I think there are 2 options on how we can go ahead:

  1. Implement how DETR people have done this (they're relying very heavily on the official implementation and they're focussing on torch dataset here. I feel ours should be something generic instead of pytorch specific.
  2. Do this implementation where user can convert its output and ground truth annotation to pre-defined format and then feed it into our function to calculate metrics (looks very similar to you wanted above)

In my opinion, 2nd option looks very clean but I'm still figuring out how's it transforming the box co-ordinates of coco_gt which you've passed to CocoEvaluator (ground truth for evaluation). Since your model output was already converted to COCO api, I faced little problems there.

bhavitvyamalik avatar Jun 03 '21 14:06 bhavitvyamalik

Ok, thanks for the update.

Indeed, the metrics API of Datasets is framework agnostic, so we can't rely on a PyTorch-only implementation.

This file is probably want we need to implement.

NielsRogge avatar Jun 04 '21 07:06 NielsRogge

Hi @lvwerra

Do you plan to add a 3rd party application for the COCO map metric?

kadirnar avatar Aug 08 '22 21:08 kadirnar

Is there any update on this? What would be the recommended way of doing COCO eval with Huggingface?

roboserg avatar Aug 07 '23 00:08 roboserg

Yes there's an update on this. @rafaelpadilla has been working on adding native support for COCO metrics in the evaluate library, check the Space here: https://huggingface.co/spaces/rafaelpadilla/detection_metrics. For now you have to load the metric as follows:

import evaluate

evaluator = evaluate.load("rafaelpadilla/detection_metrics", json_gt=ground_truth_annotations, iou_type="bbox")

but this one is going to be integrated in the main evaluate library.

This is then leveraged to create the open object detection leaderboard: https://huggingface.co/spaces/rafaelpadilla/object_detection_leaderboard.

NielsRogge avatar Aug 07 '23 07:08 NielsRogge

Yep, we intend to integrate to evaluate library.

Meanwhile you can use from here https://huggingface.co/spaces/rafaelpadilla/detection_metrics

Update: the code with the evaluate AP metric and its variations was transferred to https://huggingface.co/spaces/hf-vision/detection_metrics

rafaelpadilla avatar Aug 08 '23 13:08 rafaelpadilla

Hi, running

import evaluate
evaluator = evaluate.load("hf-vision/detection_metrics", json_gt=ground_truth_annotations, iou_type="bbox")

results in the following error:

ImportError: To be able to use hf-vision/detection_metrics, you need to install the following dependencies['detection_metrics'] using 'pip install detection_metrics' for instance'

How do I load the metric from the hub? Do I need to download the content of that repository manually first?

I'm running evaluate==0.4.1.

maltelorbach avatar Dec 14 '23 13:12 maltelorbach

Ran into the same issue @maltelorbach posted on 12/14/2023

sushil-bharati avatar Feb 03 '24 21:02 sushil-bharati