[RFC] Datasets API
🚀 Feature
Having Datasets API for commonly used formats will come in handy.
Pitch
A non-exhaustive list of formats that are commonly used:
- [ ] CSV file with
image_idandtargetcolumns (Binary or Multi-Class Classification). There are two ways that are used most often in this:
image_id target
100011 1
100015 0
100007 2
Above has been implemented using CSVSingleLabelDataset. Should we add support for below in the same or should we create a separate one? I think we can have both in the same.
image_id target
100011.png 1
100015.png 0
100007.png 2
- [ ] CSV file with
image_idandtargetcolumns (Multi-Label Classification). Similarly, there are two ways that are used most often in this:
image_id target
100011 0 1
100015 0 2
100007 1 2
image_id target
100011.png 0 1
100015.png 0 2
100007.png 1 2
- [ ] Folder structure like below:
folder
|-- test
`-- train
|-- class_1
| |-- 10001.png
| `-- 10002.png
|-- class_2
| |-- 10005.png
| `-- 10009.png
`-- class_3
|-- 10014.png
`-- 10027.png
Above has been implemented using create_folder_dataset but we don't always need to split the train into train_set and valid_set. Because we may have cases where valid_set is pre-defined like below:
folder
|-- test
|-- train
| |-- class_1
| | |-- 10001.png
| | `-- 10002.png
| |-- class_2
| | |-- 10005.png
| | `-- 10009.png
| `-- class_3
| |-- 10014.png
| `-- 10027.png
`-- valid
|-- class_1
| |-- 10023.png
| `-- 10035.png
|-- class_2
| |-- 1002.png
| `-- 10042.png
`-- class_3
|-- 10029.png
`-- 10076.png
- [ ] CSV file with
image_idandbboxcolumn (Object Detection). Similar to classification tasks, there can be two ways that are used most often in this:
image_id bbox
100011 [834.0, 222.0, 56.0, 36.0]
100011 [226.0, 548.0, 130.0, 58.0]
100007 [377.0, 504.0, 74.0, 160.0]
Honestly, I have never seen the below format but still we can have support for this.
image_id bbox
100011.jpg [834.0, 222.0, 56.0, 36.0]
100011.jpg [226.0, 548.0, 130.0, 58.0]
100007.jpg [377.0, 504.0, 74.0, 160.0]
I have come across only the above four formats, but do let me know if I missed any. And also let me know your thoughts on the above.
cc @zhiqwang
For object detection task, there are two other frequently used formats: Pascal VOC and MSCOCO, and it is supported in torchvision, I am not sure that we didn't mention this two Datasets is for we just use torchvision's implementation when we met this two datasets?
I think we should discuss more over this. Datasets is really tricky especially when it comes to object detection etc. For the Torchvision models, we expect VOC format. And for Detr, a normalized YOLO format. We haven't enforced these as these have come from models themselves.