supervision icon indicating copy to clipboard operation
supervision copied to clipboard

[Detections] extend `from_transformers` with segmentation models support

Open SkalskiP opened this issue 1 year ago • 4 comments

Description

Currently, Supervision only supports Transformers object detection models. Let's expand from_transformers by adding support for segmentation models.

API

The code below should enable the annotation of an image with segmentation results.

import torch
import supervision as sv
from PIL import Image
from transformers import DetrImageProcessor, DetrForSegmentation

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")

image = Image.open(<PATH TO IMAGE>)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

width, height = image.size
target_size = torch.tensor([[height, width]])
results = processor. post_process_segmentation(
    outputs=outputs, target_sizes=target_size)[0]
detections = sv.Detections.from_transformers(results)

mask_annotator = sv.MaskAnnotator()

annotated_image = mask.annotate(scene=image, detections=detections)

Additional

  • Transformers DETR Docs
  • Note: Please share a Google Colab with minimal code to test the new feature. We know it's additional work, but it will speed up the review process. The reviewer must test each change. Setting up a local environment to do this is time-consuming. Please ensure that Google Colab can be accessed without any issues (make it public). Thank you! 🙏🏻

SkalskiP avatar Mar 25 '24 12:03 SkalskiP

I'd like to take this one.

Griffin-Sullivan avatar Mar 25 '24 20:03 Griffin-Sullivan

@Griffin-Sullivan task is yours good luck :)

onuralpszr avatar Mar 25 '24 21:03 onuralpszr

So the post_process_segmentation method only returns ['scores', 'labels', 'masks']. I found a utility function to get the xyxy from the mask https://github.com/roboflow/supervision/blob/42f9d4b69b9d2c97756d69184359bdc484e896a8/supervision/detection/utils.py#L307. Is this the right approach here? When I run everything it works and I get detections but the annotations don't come out right. You can see in my colab there's only one colored line at the top of the annotated picture.

Here's the colab: https://colab.research.google.com/drive/1j1O95pxlHDPZ1XtmLJ_VcmkYAk_KvrUg?usp=sharing

And my draft PR: https://github.com/roboflow/supervision/pull/1054

Griffin-Sullivan avatar Mar 26 '24 18:03 Griffin-Sullivan

Hi @Griffin-Sullivan~ 👋🏻 Yes mask_to_xyxy is what you need. Looks like your problem was mostly related to np.array dtype. I shared more details in the comment under your PR.

SkalskiP avatar Mar 26 '24 20:03 SkalskiP

This issue was solved via https://github.com/roboflow/supervision/pull/1054. I'm closing this issue.

SkalskiP avatar Mar 28 '24 13:03 SkalskiP

Hi @SkalskiP,

I've reviewed the HF transformers code - #1054 didn't add support for instance and panoptic segmentation. The result dicts from HF are much different in those cases.

I could open an issue. Is that something we want?

LinasKo avatar Apr 02 '24 11:04 LinasKo

@LinasKo are those potential changes reflected in comments under this PR: https://github.com/roboflow/supervision/pull/1046, or is it more than this?

SkalskiP avatar Apr 02 '24 12:04 SkalskiP

@SkalskiP The missing part is where I suggested raising NoImplementedError which passing instance and panoptic seg results would trigger.

LinasKo avatar Apr 02 '24 12:04 LinasKo