supervision ValueError when loading COCO dataset with multiple segmentation masks for one class

Search before asking

[X] I have searched the Supervision issues and found no similar bug report.

Bug

My current COCO dataset includes annotations with more than 1 segmentation masks of the same class. A rough analogy is as follows whereby one eye of a cat is segmented as a whole but when exported from Fiftyone two polygons are produced (turned into segmentation masks):

cat

As a result, when the COCO dataset is loaded into my program using supervision, the program crashes with the following error:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (7,) + inhomogeneous part.

After some research, I discovered Ultralytics' JSON2YOLO repository on GitHub, and adapted the library's merge_multi_segment() (seen here) function in supervision's coco.py file which then allows the COCO dataset to be loaded.

Environment

Supervision: 0.20.0
OS: OpenSUSE Tumbleweed 20240423
Python: 3.12.2

Minimal Reproducible Example

The following code is used to load the COCO dataset with the annotations_path being the path to a .json file containing the paths and annotations for all images in the dataset:

ds = sv.DetectionDataset.from_coco(
        images_directory_path=images_directory_path,
        annotations_path=annotations_path,
        force_masks=True,
    )

The following is an example of a class/category containing multiple segmentation masks:

{
 ...
    {
      "id": 41,
      "image_id": 6,
      "category_id": 0,
      "bbox": [
        694.801517364719,
        278.90263698033465,
        161.52883212628387,
        282.881946369456
      ],
      "segmentation": [
        [
          694,
          560.5,
          764.5,
          407,
          759.5,
          400,
          765.5,
          397,
          765.5,
          393,
          760.5,
          391,
          759.5,
          384,
          754.5,
          381,
          763,
          376.5,
          767.5,
          370,
          764.5,
          363,
          768.5,
          354,
          735,
          278.5,
          741.5,
          284,
          776,
          356.5,
          782,
          359.5,
          794.5,
          348,
          806.5,
          321,
          809.5,
          321,
          799.5,
          346,
          800,
          348.5,
          806.5,
          349,
          798.5,
          364,
          800.5,
          387,
          808.5,
          400,
          818.5,
          408,
          811,
          413.5,
          802,
          405.5,
          802.5,
          416,
          855.5,
          530,
          851.5,
          529,
          787.5,
          395,
          779,
          394.5,
          714.5,
          531,
          715.5,
          522,
          729,
          490.5,
          694,
          560.5
        ],
        [
          713,
          534.5,
          713,
          531.5,
          713,
          534.5
        ]
      ],
      "area": 45693.59042666829,
      "iscrowd": 0,
      "ignore": 0
    }
}

Additional

No response

Are you willing to submit a PR?

[X] Yes I'd like to help by submitting a PR!

May 20 '24 09:05 DancinParrot

Hi @DancinParrot 👋🏻 Sorry for the late response, but I traveled a lot at the end of last week, and my access to GitHub was limited.

You're correct, we currently do not support loading multi-segment masks. I assume the change you want to make would be in the coco_annotations_to_masks function?

May 20 '24 11:05 SkalskiP

Hi @SkalskiP! Thanks for your response, I wasn't expecting a response this quick actually so no worries!

I see, that would explain the error. However, the error was actually raised from this line which is within the coco_annotations_to_detections function. I could not find any reference of the coco_annotations_to_masks function within this repo, did I miss out anything?

I modified the coco_annotations_to_detections function based on JSON2YOLO's implementation:

def coco_annotations_to_detections(
    image_annotations: List[dict], resolution_wh: Tuple[int, int], with_masks: bool
) -> Detections:
    #...

    if with_masks:
        polygons = []

        for image_annotation in image_annotations:
            segmentations = image_annotation["segmentation"]
            if len(segmentations) > 1:
                s = merge_multi_segment(segmentations)
                s = (
                    (np.concatenate(s, axis=0) / np.array(resolution_wh))
                    .reshape(-1)
                    .tolist()
                )
                reshaped = np.reshape(np.asarray(s, dtype=np.int32), (-1, 2))
            else:
                reshaped = np.reshape(
                    np.asarray(segmentations, dtype=np.int32), (-1, 2)
                )
            polygons.append(reshaped)

        #...

    return Detections(xyxy=xyxy, class_id=np.asarray(class_ids, dtype=int))

The aforementioned modification allows the dataset to be loaded. Though, I'm not sure if the outcome really fits my use case since the merge_multi_segment() function included in JSON2YOLO seems to connect all segmentation masks into one with a thin line, which I presume would form one whole mask as opposed to the intended seperate masks. Any thoughts on this?

May 20 '24 12:05 DancinParrot

However, the error was actually raised from this line [...]

I'm very sorry. You're right, of course. I mentioned coco_annotations_to_masks because there is ongoing work on COCO loading and saving in #1163, and this method will appear in that PR. Don't worry about it.

As for merge_multi_segment, we can't use that implementation since JSON2YOLO is under an AGPL license, which would conflict with our MIT license. Therefore, we need to implement our own version of that function.

I'm not sure if the outcome really fits my use case since the merge_multi_segment() function included in JSON2YOLO seems to connect all segmentation masks into one with a thin line, which I presume would form one whole mask as opposed to the intended separate masks.

If you want to load this as two separate masks, your COCO JSON is incorrectly constructed. You should not have multiple lists under the segmentation key. These should be separate annotations. If you have multiple lists there, they should be loaded as a single mask.

May 20 '24 15:05 SkalskiP

I'm very sorry. You're right, of course. I mentioned coco_annotations_to_masks because there is ongoing work on COCO loading and saving in #1163, and this method will appear in that PR. Don't worry about it.

I see, no worries!

As for merge_multi_segment, we can't use that implementation since JSON2YOLO is under an AGPL license, which would conflict with our MIT license. Therefore, we need to implement our own version of that function.

Understood, I'm trying out different implementations currently to fix the issue.

If you want to load this as two separate masks, your COCO JSON is incorrectly constructed. You should not have multiple lists under the segmentation key. These should be separate annotations. If you have multiple lists there, they should be loaded as a single mask.

My apologies for the confusion, I might have misunderstood my dataset. In my current workflow, I export the annotated data from Label Studio in the form of a COCO dataset. Next, I import the dataset to Fiftyone for augmentation with Albumentation, which is then exported as a COCODetectionDataset to preserve the segmentation masks along with the bbox. Though, it seems that during this step, Fiftyone's native COCODetectionDataset exporter might have modified the structure of my dataset such that a mask is split into multiple parts (perhaps due to overlapping masks), resulting in supervision's inability to parse the dataset. However, from my observation, the dataset is still structured properly since I was able to import the dataset to Fiftyone and Label Studio again and the annotations remained unchanged. Thus, I doubt it's an issue with Fiftyone's exporter, but rather supervision's inability to merge the list of segmentation masks.

Lastly, I tested out JSON2YOLO's merge_multi_segment() function which appeared to have modified the masks to the point where the contour is indecipherable by opencv, as findContours() was unable to produce an output. Hence, there would be a need for another implementation, which I believe may be found and adapted from Fiftyone's source code, particularly the _coco_segmentation_to_mask() function (here). Would love to hear your thoughts on this approach. Thanks!

May 21 '24 06:05 DancinParrot

Just an update, after messing around with a few implementations, I finally came up with a functional code by combining _coco_segmentation_to_mask() from Fiftyone with mask2polygon() in ultralytics/JSON2YOLO#38. The flow is as follows:

If more than 1 segmentation masks exist in the dataset, normalize, merge and encode the list of segmentation masks obtained from COCO's labels.json file into RLE.
Decode the RLE and pass as argument to the mask2polygon() function.
The mask2polygon() function finds the contours within a mask, obtains the hierarchy of contours in the form of [[Next, Previous, First_Child, Parent], ...] (see cv2.findContours() docs).
Then, it will loop over the list of contours and merge those with a valid present (non -1 value in hierarchy list).
However, for contours without parents, they will not be merged. In this case, my method involves either recursively merging all contours until there's only 1 left in the list or choose the biggest contour as the resulting polygon based on area (I chose the latter).

Although the resulting product is not perfect, but it works well enough (so far) for my use case. I'm most definitely open to feedback and suggestions on ways to improve as well as possible alternatives to this approach.

Should I open a PR for this? @SkalskiP

Code is roughly as follows:

# From https://github.com/voxel51/fiftyone/blob/8205caf7646e5e7cb38041a94efb97f6524c1db6/fiftyone/utils/coco.py
def normalize_coco_segmentation(segmentation):
    # Filter out empty segmentations
    # For polygons of 4 points (1 pixel), duplicate to convert to valid polygon
    _segmentation = []
    for seg in segmentation:
        if len(seg) == 0:
            continue

        if len(seg) == 4:
            seg *= 4

        _segmentation.append(seg)

    return _segmentation

# From https://github.com/ultralytics/JSON2YOLO/issues/38
def is_clockwise(contour):
    value = 0
    num = len(contour)
    for i, point in enumerate(contour):
        p1 = contour[i]
        if i < num - 1:
            p2 = contour[i + 1]
        else:
            p2 = contour[0]
        value += (p2[0][0] - p1[0][0]) * (p2[0][1] + p1[0][1])
    return value < 0


def get_merge_point_idx(contour1, contour2):
    idx1 = 0
    idx2 = 0
    distance_min = -1
    for i, p1 in enumerate(contour1):
        for j, p2 in enumerate(contour2):
            distance = pow(p2[0][0] - p1[0][0], 2) + pow(p2[0][1] - p1[0][1], 2)
            if distance_min < 0:
                distance_min = distance
                idx1 = i
                idx2 = j
            elif distance < distance_min:
                distance_min = distance
                idx1 = i
                idx2 = j
    return idx1, idx2


def merge_contours(contour1, contour2, idx1, idx2):
    contour = []
    for i in list(range(0, idx1 + 1)):
        contour.append(contour1[i])
    for i in list(range(idx2, len(contour2))):
        contour.append(contour2[i])
    for i in list(range(0, idx2 + 1)):
        contour.append(contour2[i])
    for i in list(range(idx1, len(contour1))):
        contour.append(contour1[i])
    contour = np.array(contour)
    return contour


def merge_with_parent(contour_parent, contour):
    if not is_clockwise(contour_parent):
        contour_parent = contour_parent[::-1]
    if is_clockwise(contour):
        contour = contour[::-1]
    idx1, idx2 = get_merge_point_idx(contour_parent, contour)
    return merge_contours(contour_parent, contour, idx1, idx2)


def mask2polygon(image):
    contours, hierarchies = cv2.findContours(
        image, cv2.RETR_TREE, cv2.CHAIN_APPROX_TC89_KCOS
    )
    contours_approx = []
    polygons = []
    for contour in contours:
        epsilon = 0.001 * cv2.arcLength(contour, True)
        contour_approx = cv2.approxPolyDP(contour, epsilon, True)
        contours_approx.append(contour_approx)

    contours_parent = []
    for i, contour in enumerate(contours_approx):
        parent_idx = hierarchies[0][i][3]
        if parent_idx < 0 and len(contour) >= 3:
            contours_parent.append(contour)
        else:
            contours_parent.append([])

    for i, contour in enumerate(contours_approx):
        parent_idx = hierarchies[0][i][3]
        if parent_idx >= 0 and len(contour) >= 3:
            contour_parent = contours_parent[parent_idx]
            if len(contour_parent) == 0:
                continue
            contours_parent[parent_idx] = merge_with_parent(contour_parent, contour)

    contours_parent_tmp = []
    for contour in contours_parent:
        if len(contour) == 0:
            continue
        contours_parent_tmp.append(contour)

    polygons = []
    max_area = 0
    max_contour = None

    # Get the largest contour based on area
    for contour in contours_parent_tmp:
        area = cv2.contourArea(contour)

        if area > max_area:
            max_area = area
            max_contour = contour

    if max_contour is not None:
        polygon = max_contour.flatten().tolist()
        return polygon


def coco_segmentation_to_mask(segmentation, bbox, frame_size):
    x, y, w, h = bbox
    width, height = frame_size

    if isinstance(segmentation, list):
        # Polygon -- a single object might consist of multiple parts, so merge
        # all parts into one mask RLE code
        segmentation = normalize_coco_segmentation(segmentation)
        if len(segmentation) == 0:
            return None

        rle = mask_utils.merge(mask_utils.frPyObjects(segmentation, height, width))
    elif isinstance(segmentation["counts"], list):
        # Uncompressed RLE
        rle = mask_utils.frPyObjects(segmentation, height, width)
    else:
        # RLE
        rle = segmentation

    mask = mask_utils.decode(rle)
    polygon = mask2polygon(mask)

    return polygon


def coco_annotations_to_detections(
    image_annotations: List[dict], resolution_wh: Tuple[int, int], with_masks: bool
) -> Detections:
    if not image_annotations:
        return Detections.empty()

    class_ids = [
        image_annotation["category_id"] for image_annotation in image_annotations
    ]
    xyxy = [image_annotation["bbox"] for image_annotation in image_annotations]
    xyxy = np.asarray(xyxy)
    xyxy[:, 2:4] += xyxy[:, 0:2]

    if with_masks:
        polygons = []

        for image_annotation in image_annotations:

            segmentation = image_annotation["segmentation"]
            print("Segmentation: ", segmentation)
            if len(segmentation) > 1:
                s = coco_segmentation_to_mask(
                    segmentation, image_annotation["bbox"], resolution_wh
                )
                reshaped = np.reshape(np.asarray(s, dtype=np.int32), (-1, 2))
            else:
                reshaped = np.reshape(
                    np.asarray(image_annotation["segmentation"], dtype=np.int32),
                    (-1, 2),
                )

            polygons.append(reshaped)

        mask = _polygons_to_masks(polygons=polygons, resolution_wh=resolution_wh)
        return Detections(
            class_id=np.asarray(class_ids, dtype=int), xyxy=xyxy, mask=mask
        )

    return Detections(xyxy=xyxy, class_id=np.asarray(class_ids, dtype=int))

May 21 '24 09:05 DancinParrot

Hi @DancinParrot 👋🏻

I must admit, I am very confused. Initially, I thought it was only about loading COCO annotations consisting of multiple segments. Is that still the case?

I don't quite understand why we need all these extra steps like RLE conversion and polygon conversion.

May 21 '24 14:05 SkalskiP

I must admit, I am very confused. Initially, I thought it was only about loading COCO annotations consisting of multiple segments. Is that still the case?

Yup, still that. It was my mistake, I misunderstood the issue, turns out Fiftyone split the mask for one annotation into multiple parts during export. The dataset is still structured properly though as it can be read on Fiftyone and Label Studio, it's only supervision that is unable to load the dataset.

I don't quite understand why we need all these extra steps like RLE conversion and polygon conversion.

The conversion to RLE I suppose merges all masks within an annotation to one array which is later used as input for the mask2polygon() function. This function is the one responsible for merging the contours together to form one polygon which then allows supervision to properly load the dataset.

EDIT: I have updated the original issue to include my understanding of the issue.

May 21 '24 14:05 DancinParrot

I think we should just be able to update the section of the code here:

def coco_annotations_to_masks(
    image_annotations: List[dict], resolution_wh: Tuple[int, int]
) -> npt.NDArray[np.bool_]:
    return np.array(
        [
            rle_to_mask(
                rle=np.array(image_annotation["segmentation"]["counts"]),
                resolution_wh=resolution_wh,
            )
            if image_annotation["iscrowd"]
            else polygon_to_mask(
                polygon=np.reshape(
                    np.asarray(image_annotation["segmentation"], dtype=np.int32),
                    (-1, 2),
                ),
                resolution_wh=resolution_wh,
            )
            for image_annotation in image_annotations
        ],
        dtype=bool,
    )

It is unhappy because it does not expect multiple lists here image_annotation["segmentation"].

The easiest way (not the most efficient), but still a lot more efficient than the conversion through all of the representations above, is to loop through lists in image_annotation["segmentation"], create a separate mask for each of them and than run np.logical_or(mask_1, mask_2).

May 21 '24 16:05 SkalskiP

The easiest way (not the most efficient), but still a lot more efficient than the conversion through all of the representations above, is to loop through lists in image_annotation["segmentation"], create a separate mask for each of them and than run np.logical_or(mask_1, mask_2).

This seems a lot more efficient. Thanks! I'll try it out tomorrow when I get access to my work laptop and update this thread on the results.

May 22 '24 05:05 DancinParrot

@DancinParrot Sure! Let me know how it goes.

May 22 '24 05:05 SkalskiP

Hi @SkalskiP ! Thank you so much for your help! Here's the code that I've implemented based on your recommendation and it seems to merge all the polygons very well:

def merge_masks(segmentations, resolution_wh):
    parent = None
    for s in segmentations:
        if parent is None:
            parent = polygon_to_mask(
                polygon=np.reshape(
                    np.asarray(s, dtype=np.int32),
                    (-1, 2),
                ),
                resolution_wh=resolution_wh,
            )
        else:
            mask = polygon_to_mask(
                polygon=np.reshape(
                    np.asarray(s, dtype=np.int32),
                    (-1, 2),
                ),
                resolution_wh=resolution_wh,
            )

            parent = np.logical_or(parent, mask)

    return parent


def coco_annotations_to_masks(
    image_annotations: List[dict], resolution_wh: Tuple[int, int]
) -> npt.NDArray[np.bool_]:
    return np.array(
        [
            (
                rle_to_mask(
                    rle=np.array(image_annotation["segmentation"]["counts"]),
                    resolution_wh=resolution_wh,
                )
                if image_annotation["iscrowd"]
                else (
                    merge_masks(image_annotation["segmentation"], resolution_wh)
                    if len(image_annotation["segmentation"]) > 1
                    else polygon_to_mask(
                        polygon=np.reshape(
                            np.asarray(
                                image_annotation["segmentation"], dtype=np.int32
                            ),
                            (-1, 2),
                        ),
                        resolution_wh=resolution_wh,
                    )
                )
            )
            for image_annotation in image_annotations
        ],
        dtype=bool,
    )

Any feedback for further improvement is much appreciated. Also, should I create a PR for this?

May 23 '24 08:05 DancinParrot

Hi @DancinParrot 👋🏻, that seems like a good starting point. Please open a PR proposing the change. 🙏🏻

May 24 '24 12:05 SkalskiP

Hi @SkalskiP ! Will do, thanks!

May 24 '24 13:05 DancinParrot