cvat icon indicating copy to clipboard operation
cvat copied to clipboard

Annotation import error with segmentation masks in json based formats (one to many mask instances to fix)

Open jlwhelan28 opened this issue 1 year ago • 1 comments

Actions before raising this issue

  • [X] I searched the existing issues and did not find anything similar.
  • [X] I read/searched the docs

Steps to Reproduce

Documenting an issue, workaround, and possible solution I ran into with an external process (model deployed in production system seperate from CVAT/nuclio model integration) building datasets to import into CVAT with the datumaro library. I found that you must also identify separate disconnected instances within your mask, and cannot import a dataset storing multiple separate instances of a class as a single annotation. CVAT supports export but not import of disconnected mask instances. The datumaro library itself does not mind if you do this, but cvat.apps.dataset_manager.bindings.import_dm_annotations breaks.

  1. External model pipeline generates segmentation masks for a batch set of images
  2. External model pipeline constructs a datumaro.Dataset object from scratch using datumaro.Dataset.from_iterable, datumaro.DatasetItem, and datumaro.Mask with the model's predicted mask
  3. External model pipeline generates datasets package Dataset.export("./", format="datumaro", save_media=True) (also tried coco)
  4. Import dataset into CVAT

Traceback from cvat-worker-import container

[2024-02-08 14:06:07,377] ERROR rq.worker: [Job import:project-1-dataset-by-XXXXX]: exception raised while executing (cvat.apps.engine.utils.import_resource_with_clean_up_after)
Traceback (most recent call last):
  File "/home/django/cvat/apps/dataset_manager/bindings.py", line 1964, in import_dm_annotations
    top = int(istrue[0].min())
  File "/opt/venv/lib/python3.10/site-packages/numpy/core/_methods.py", line 44, in _amin
    return umr_minimum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation minimum which has no identity

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/rq/worker.py", line 1428, in perform_job
    rv = job.perform()
  File "/opt/venv/lib/python3.10/site-packages/rq/job.py", line 1278, in perform
    self._result = self._execute()
  File "/opt/venv/lib/python3.10/site-packages/rq/job.py", line 1315, in _execute
    result = self.func(*self.args, **self.kwargs)
  File "/home/django/cvat/apps/engine/utils.py", line 332, in import_resource_with_clean_up_after
    result = func(filename, *args, **kwargs)
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/django/cvat/apps/dataset_manager/project.py", line 176, in import_dataset_as_project
    project.import_dataset(f, importer, conv_mask_to_poly=conv_mask_to_poly)
  File "/home/django/cvat/apps/dataset_manager/project.py", line 155, in import_dataset
    importer(dataset_file, temp_dir, project_data, self.load_dataset_data, **options)
  File "/home/django/cvat/apps/dataset_manager/formats/registry.py", line 36, in __call__
    f_or_cls(*args, **kwargs)
  File "/home/django/cvat/apps/dataset_manager/formats/datumaro.py", line 44, in _import
    import_dm_annotations(dataset, instance_data)
  File "/home/django/cvat/apps/dataset_manager/bindings.py", line 1890, in import_dm_annotations
    import_dm_annotations(sub_dataset, task_data)
  File "/home/django/cvat/apps/dataset_manager/bindings.py", line 2104, in import_dm_annotations
    raise CvatImportError("Image {}: can't import annotation "
cvat.apps.dataset_manager.bindings.CvatImportError: Image IMG_0971: can't import annotation #0 (mask): zero-size array to reduction operation minimum which has no identity

Expected Behavior

Dataset loads into CVAT with annotations. If a single annotation contains multiple instances, CVAT will identify the separate instances using contour discovery to process into polygons, or load the mask as a single annotation ID if polygon conversion is not selected.

OR

More helpful exception handling to make the issue with the annotation clear.

Possible Solution

Discover instances within the model output mask using cv2.findContours and then create the datumaro objects.

Context

No response

Environment

No response

jlwhelan28 avatar Feb 08 '24 14:02 jlwhelan28

I imagine this is likely a won't-fix for the CVAT team, documenting in an issue for anyone else who runs into a similar error. A blurb about this in the documentation, or a section on advanced dataset construction and import from external sources might be helpful.

jlwhelan28 avatar Feb 08 '24 14:02 jlwhelan28

Hi,

From the error, it's not totally clear how it's related to separate masks, but there are probably some empty masks in the imported dataset. Sometimes such masks can be exported from CVAT or generated by the external tools. CVAT should have no problems in importing multiple masks of the same class in COCO or Datumaro formats.

To test the problem, please try to clean empty masks in the dataset before importing it with a script like this:

from argparse import ArgumentParser

import datumaro as dm
from datumaro.cli.util.project import parse_dataset_pathspec

class RemoveEmptyMasks(dm.ItemTransform):
    def transform_item(self, item: dm.DatasetItem) -> dm.DatasetItem | None:
        updated_anns = []

        for a in item.annotations:
            if isinstance(a, dm.Mask) and a.get_area() == 0:
                continue

            updated_anns.append(a)

        return self.wrap_item(item, annotations=updated_anns)

def main():
    parser = ArgumentParser()
    parser.add_argument("-f", "--format", help="Output format (default: use original)")
    parser.add_argument("input_dataset", help="Input dataset path or path:format")
    parser.add_argument("output_dir", help="Output path")
    args = parser.parse_args()

    dataset = parse_dataset_pathspec(args.input_dataset)

    dataset.transform(RemoveEmptyMasks)

    dataset.export(args.output_dir, args.format or dataset.format)


if __name__ == "__main__":
    main()

(the code is for Datumaro v0.3-based, which CVAT currently uses / pip install "datumaro @ git+https://github.com/cvat-ai/datumaro@dc66ee56a2679661b2b2c6abef8917f17a9451df")

Call it like this: python remove_empty_masks.py "input_dataset_dir/" "output_dir/".

zhiltsov-max avatar Mar 05 '24 10:03 zhiltsov-max