Problems with yolo loose format
Hello, I'm trying to get my hands on datumaro library and have some troubles with yolo format. According to the documentation it's possible to import yolo dataset in a "loose format'. I have a directory with the following structure:
yolo-ds
├── images
│ ├── 2024-01-09_13.52.57.png
│ ├── 2024-01-09_22.21.52.png
│ ├── 2024-01-10_00.44.59.png
│ ├── 2024-01-10_06.48.19.png
│ ├── 2024-01-12_01.56.08.png
│ └── 2024-01-12_09.10.30.png
├── labels
│ ├── 2024-01-09_13.52.57.txt
│ ├── 2024-01-09_22.21.52.txt
│ ├── 2024-01-10_00.44.59.txt
│ ├── 2024-01-10_06.48.19.txt
│ ├── 2024-01-12_01.56.08.txt
│ └── 2024-01-12_09.10.30.txt
└── obj.names
I'm importing the dataset with the following code:
ds = dm.Dataset.import_from("yolo-ds", "yolo_loose")
print(type(ds))
print(ds)
Dataset imports successfully and line print(type(ds)) outputs datumaro.components.dataset.Dataset. However, the third line fails with "TypeError: 'NoneType' object is not iterable" error.
The full traceback can be found here
I use Python 3.10.13 and datumaro version is 1.5.2
Hi @nik123, Sorry for your inconvenience. I tried to reproduce it with a dummy dataset in our codebase (you can see here) such as
import datumaro as dm
print(f"Datumaro version: {dm.__version__}")
dataset = dm.Dataset.import_from("tests/assets/yolo_dataset/labels")
print(type(dataset))
However, it shows no problem as follows.
Datumaro version: 1.5.2
<class 'datumaro.components.dataset.Dataset'>
If possible, would you attach a small reproducible dataset as a zip file here? It will help us to find the problem.
@vinnamkim here it is https://github.com/nik123/datumaro-yolo-loose-issue/
I experience problems executing print(ds). I suppose the dataset should be printable. However, I receive errors.
Hi,
I've forgot what I implemented in the mean time, haha. You need to give format=yolo rather than format=yolo_loose even if it's yolo loose format (documentation here)
I can import the dataset you provided as follows and it looks successful.
git clone https://github.com/nik123/datumaro-yolo-loose-issue/
python test.py
# test.py
import datumaro as dm
print(f"Datumaro version: {dm.__version__}")
dataset = dm.Dataset.import_from("datumaro-yolo-loose-issue/yolo-loose-ds", format="yolo")
print(dataset)
Datumaro version: 1.5.2
Dataset
size=6
source_path=datumaro-yolo-loose-issue/yolo-loose-ds
media_type=<class 'datumaro.components.media.Image'>
annotated_items_count=6
annotations_count=22
subsets
train: # of items=3, # of annotated items=3, # of annotations=7, annotation types=['bbox']
val: # of items=3, # of annotated items=3, # of annotations=15, annotation types=['bbox']
infos
categories
label: ['label1', 'label2', 'label3', 'label4', 'label5', 'label6', 'label7', 'label8', 'label9']
Thanks @vinnamkim ! This indeed helped
There still seems to be a slight issue regarding the behavior. Despite using format=yolo_loose, the dataset is imported successfully. Perhaps it would be more appropriate to raise an UnknownFormatError instead.
There still seems to be a slight issue regarding the behavior. Despite using
format=yolo_loose, the dataset is imported successfully. Perhaps it would be more appropriate to raise anUnknownFormatErrorinstead.
Thanks for your good suggestion. We will put it in our backlog.
Hi @nik123, thank you for your interests in Datumaro!
yolo_loose is not the format defined in importers, but it is declared as an extractor.
And Datumaro is able to import a dataset with extractor too as described in https://github.com/openvinotoolkit/datumaro/blob/f9366173a0a5ba6fe479703800a2f3a0caf15530/src/datumaro/components/dataset.py#L822
But, as you observed, importing a dataset with yolo_loose was failed and this is because of lack of arguments.
That means, we need to specify more arguments to import a dataset through extractor.
Please see the example below.
import os
import os.path as osp
import datumaro as dm
path = "yolo-loose-ds"
sources = []
for subset in ["train", "val"]:
ann_path = osp.join(path, 'labels', subset)
urls = [osp.join(ann_path, ann_file) for ann_file in os.listdir(ann_path)]
sources.append(dm.Dataset.import_from(
path=path,
format="yolo_loose",
subset=subset,
urls=urls,
)
)
dataset = dm.Dataset.from_extractors(*sources)
print(dataset)
Yes, it is indeed much complex compared to the case of importer.
Therefore, we recommend users to use importer because it automates parameterization!
But, I want to show that the use of exatractor.
Thanks again and hope to see you again!