cvat
cvat copied to clipboard
Annotation import error with segmentation masks in json based formats (one to many mask instances to fix)
Actions before raising this issue
- [X] I searched the existing issues and did not find anything similar.
- [X] I read/searched the docs
Steps to Reproduce
Documenting an issue, workaround, and possible solution I ran into with an external process (model deployed in production system seperate from CVAT/nuclio model integration) building datasets to import into CVAT with the datumaro library. I found that you must also identify separate disconnected instances within your mask, and cannot import a dataset storing multiple separate instances of a class as a single annotation. CVAT supports export but not import of disconnected mask instances. The datumaro
library itself does not mind if you do this, but cvat.apps.dataset_manager.bindings.import_dm_annotations
breaks.
- External model pipeline generates segmentation masks for a batch set of images
- External model pipeline constructs a
datumaro.Dataset
object from scratch usingdatumaro.Dataset.from_iterable
,datumaro.DatasetItem
, anddatumaro.Mask
with the model's predicted mask - External model pipeline generates datasets package
Dataset.export("./", format="datumaro", save_media=True)
(also tried coco) - Import dataset into CVAT
Traceback from cvat-worker-import
container
[2024-02-08 14:06:07,377] ERROR rq.worker: [Job import:project-1-dataset-by-XXXXX]: exception raised while executing (cvat.apps.engine.utils.import_resource_with_clean_up_after)
Traceback (most recent call last):
File "/home/django/cvat/apps/dataset_manager/bindings.py", line 1964, in import_dm_annotations
top = int(istrue[0].min())
File "/opt/venv/lib/python3.10/site-packages/numpy/core/_methods.py", line 44, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation minimum which has no identity
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/rq/worker.py", line 1428, in perform_job
rv = job.perform()
File "/opt/venv/lib/python3.10/site-packages/rq/job.py", line 1278, in perform
self._result = self._execute()
File "/opt/venv/lib/python3.10/site-packages/rq/job.py", line 1315, in _execute
result = self.func(*self.args, **self.kwargs)
File "/home/django/cvat/apps/engine/utils.py", line 332, in import_resource_with_clean_up_after
result = func(filename, *args, **kwargs)
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/django/cvat/apps/dataset_manager/project.py", line 176, in import_dataset_as_project
project.import_dataset(f, importer, conv_mask_to_poly=conv_mask_to_poly)
File "/home/django/cvat/apps/dataset_manager/project.py", line 155, in import_dataset
importer(dataset_file, temp_dir, project_data, self.load_dataset_data, **options)
File "/home/django/cvat/apps/dataset_manager/formats/registry.py", line 36, in __call__
f_or_cls(*args, **kwargs)
File "/home/django/cvat/apps/dataset_manager/formats/datumaro.py", line 44, in _import
import_dm_annotations(dataset, instance_data)
File "/home/django/cvat/apps/dataset_manager/bindings.py", line 1890, in import_dm_annotations
import_dm_annotations(sub_dataset, task_data)
File "/home/django/cvat/apps/dataset_manager/bindings.py", line 2104, in import_dm_annotations
raise CvatImportError("Image {}: can't import annotation "
cvat.apps.dataset_manager.bindings.CvatImportError: Image IMG_0971: can't import annotation #0 (mask): zero-size array to reduction operation minimum which has no identity
Expected Behavior
Dataset loads into CVAT with annotations. If a single annotation contains multiple instances, CVAT will identify the separate instances using contour discovery to process into polygons, or load the mask as a single annotation ID if polygon conversion is not selected.
OR
More helpful exception handling to make the issue with the annotation clear.
Possible Solution
Discover instances within the model output mask using cv2.findContours
and then create the datumaro
objects.
Context
No response
Environment
No response
I imagine this is likely a won't-fix for the CVAT team, documenting in an issue for anyone else who runs into a similar error. A blurb about this in the documentation, or a section on advanced dataset construction and import from external sources might be helpful.
Hi,
From the error, it's not totally clear how it's related to separate masks, but there are probably some empty masks in the imported dataset. Sometimes such masks can be exported from CVAT or generated by the external tools. CVAT should have no problems in importing multiple masks of the same class in COCO or Datumaro formats.
To test the problem, please try to clean empty masks in the dataset before importing it with a script like this:
from argparse import ArgumentParser
import datumaro as dm
from datumaro.cli.util.project import parse_dataset_pathspec
class RemoveEmptyMasks(dm.ItemTransform):
def transform_item(self, item: dm.DatasetItem) -> dm.DatasetItem | None:
updated_anns = []
for a in item.annotations:
if isinstance(a, dm.Mask) and a.get_area() == 0:
continue
updated_anns.append(a)
return self.wrap_item(item, annotations=updated_anns)
def main():
parser = ArgumentParser()
parser.add_argument("-f", "--format", help="Output format (default: use original)")
parser.add_argument("input_dataset", help="Input dataset path or path:format")
parser.add_argument("output_dir", help="Output path")
args = parser.parse_args()
dataset = parse_dataset_pathspec(args.input_dataset)
dataset.transform(RemoveEmptyMasks)
dataset.export(args.output_dir, args.format or dataset.format)
if __name__ == "__main__":
main()
(the code is for Datumaro v0.3-based, which CVAT currently uses / pip install "datumaro @ git+https://github.com/cvat-ai/datumaro@dc66ee56a2679661b2b2c6abef8917f17a9451df"
)
Call it like this: python remove_empty_masks.py "input_dataset_dir/" "output_dir/"
.