datumaro Problems with yolo loose format

Hello, I'm trying to get my hands on datumaro library and have some troubles with yolo format. According to the documentation it's possible to import yolo dataset in a "loose format'. I have a directory with the following structure:

yolo-ds
├── images
│   ├── 2024-01-09_13.52.57.png
│   ├── 2024-01-09_22.21.52.png
│   ├── 2024-01-10_00.44.59.png
│   ├── 2024-01-10_06.48.19.png
│   ├── 2024-01-12_01.56.08.png
│   └── 2024-01-12_09.10.30.png
├── labels
│   ├── 2024-01-09_13.52.57.txt
│   ├── 2024-01-09_22.21.52.txt
│   ├── 2024-01-10_00.44.59.txt
│   ├── 2024-01-10_06.48.19.txt
│   ├── 2024-01-12_01.56.08.txt
│   └── 2024-01-12_09.10.30.txt
└── obj.names

I'm importing the dataset with the following code:

ds = dm.Dataset.import_from("yolo-ds", "yolo_loose")
print(type(ds))
print(ds)

Dataset imports successfully and line print(type(ds)) outputs datumaro.components.dataset.Dataset. However, the third line fails with "TypeError: 'NoneType' object is not iterable" error.

The full traceback can be found here

I use Python 3.10.13 and datumaro version is 1.5.2

Mar 11 '24 17:03 nik123

Hi @nik123, Sorry for your inconvenience. I tried to reproduce it with a dummy dataset in our codebase (you can see here) such as

import datumaro as dm

print(f"Datumaro version: {dm.__version__}")

dataset = dm.Dataset.import_from("tests/assets/yolo_dataset/labels")
print(type(dataset))

However, it shows no problem as follows.

Datumaro version: 1.5.2
<class 'datumaro.components.dataset.Dataset'>

If possible, would you attach a small reproducible dataset as a zip file here? It will help us to find the problem.

Mar 12 '24 12:03 vinnamkim

@vinnamkim here it is https://github.com/nik123/datumaro-yolo-loose-issue/

I experience problems executing print(ds). I suppose the dataset should be printable. However, I receive errors.

Mar 12 '24 17:03 nik123

Hi, I've forgot what I implemented in the mean time, haha. You need to give format=yolo rather than format=yolo_loose even if it's yolo loose format (documentation here)

I can import the dataset you provided as follows and it looks successful.

git clone https://github.com/nik123/datumaro-yolo-loose-issue/
python test.py

# test.py
import datumaro as dm

print(f"Datumaro version: {dm.__version__}")

dataset = dm.Dataset.import_from("datumaro-yolo-loose-issue/yolo-loose-ds", format="yolo")
print(dataset)

Datumaro version: 1.5.2
Dataset
        size=6
        source_path=datumaro-yolo-loose-issue/yolo-loose-ds
        media_type=<class 'datumaro.components.media.Image'>
        annotated_items_count=6
        annotations_count=22
subsets
        train: # of items=3, # of annotated items=3, # of annotations=7, annotation types=['bbox']
        val: # of items=3, # of annotated items=3, # of annotations=15, annotation types=['bbox']
infos
        categories
        label: ['label1', 'label2', 'label3', 'label4', 'label5', 'label6', 'label7', 'label8', 'label9']

Mar 13 '24 12:03 vinnamkim

Thanks @vinnamkim ! This indeed helped

Mar 14 '24 16:03 nik123

There still seems to be a slight issue regarding the behavior. Despite using format=yolo_loose, the dataset is imported successfully. Perhaps it would be more appropriate to raise an UnknownFormatError instead.

Mar 14 '24 16:03 nik123

There still seems to be a slight issue regarding the behavior. Despite using format=yolo_loose, the dataset is imported successfully. Perhaps it would be more appropriate to raise an UnknownFormatError instead.

Thanks for your good suggestion. We will put it in our backlog.

Mar 15 '24 00:03 vinnamkim

Hi @nik123, thank you for your interests in Datumaro! yolo_loose is not the format defined in importers, but it is declared as an extractor. And Datumaro is able to import a dataset with extractor too as described in https://github.com/openvinotoolkit/datumaro/blob/f9366173a0a5ba6fe479703800a2f3a0caf15530/src/datumaro/components/dataset.py#L822

But, as you observed, importing a dataset with yolo_loose was failed and this is because of lack of arguments. That means, we need to specify more arguments to import a dataset through extractor. Please see the example below.

import os
import os.path as osp
import datumaro as dm

path = "yolo-loose-ds"
sources = []
for subset in ["train", "val"]:
        ann_path = osp.join(path, 'labels', subset)
        urls = [osp.join(ann_path, ann_file) for ann_file in os.listdir(ann_path)]
        sources.append(dm.Dataset.import_from(
                        path=path,
                        format="yolo_loose",
                        subset=subset,
                        urls=urls,
                )
        )
dataset = dm.Dataset.from_extractors(*sources)
print(dataset)

Yes, it is indeed much complex compared to the case of importer. Therefore, we recommend users to use importer because it automates parameterization! But, I want to show that the use of exatractor.

Thanks again and hope to see you again!

Mar 20 '24 04:03 wonjuleee