vision icon indicating copy to clipboard operation
vision copied to clipboard

Use COCO Mask Parsing from pycocotools

Open david-csnmedia opened this issue 1 year ago • 1 comments

🚀 The feature

The CocoDetection v2 transform wrapper attempts to decode the mask itself, but pycocotools provides a high performance implementation already. We have had to copy from master, this _dataset_wrapper.py because of a bug related to the handling of these masks that was fixed in master but not installable using pip yet.

https://github.com/pytorch/vision/blob/main/torchvision/tv_tensors/_dataset_wrapper.py#L402

Seeing torchvision.datasets.CocoDetection has self.coco as a COCO() object, let's use it.

       coco_ann = dataset.coco.imgToAnns[image_id]

        if "masks" in target_keys:
            target["masks"] = tv_tensors.Mask(
                    torch.stack([
                        torch.from_numpy(dataset.coco.annToMask(ann))
                        for ann in coco_ann
                    ])
                )

Motivation, pitch

There have already been bugs related to this, and there's no need to reinvent the wheel. Instead, let's use the existing implementation.

Alternatives

No response

Additional context

No response

david-csnmedia avatar Sep 03 '24 20:09 david-csnmedia

Thanks for opening the issue @david-csnmedia . I'm happy for you to open a PR and see if the tests are passing

NicolasHug avatar Sep 04 '24 08:09 NicolasHug