quickvision [RFC] Datasets API

🚀 Feature

Having Datasets API for commonly used formats will come in handy.

Pitch

A non-exhaustive list of formats that are commonly used:

[ ] CSV file with image_id and target columns (Binary or Multi-Class Classification). There are two ways that are used most often in this:

image_id        target
100011               1
100015               0
100007               2

Above has been implemented using CSVSingleLabelDataset. Should we add support for below in the same or should we create a separate one? I think we can have both in the same.

image_id        target
100011.png           1
100015.png           0
100007.png           2

[ ] CSV file with image_id and target columns (Multi-Label Classification). Similarly, there are two ways that are used most often in this:

image_id        target
100011             0 1
100015             0 2
100007             1 2

image_id        target
100011.png         0 1
100015.png         0 2
100007.png         1 2

[ ] Folder structure like below:

folder
|-- test
`-- train
    |-- class_1
    |   |-- 10001.png
    |   `-- 10002.png
    |-- class_2
    |   |-- 10005.png
    |   `-- 10009.png
    `-- class_3
        |-- 10014.png
        `-- 10027.png

Above has been implemented using create_folder_dataset but we don't always need to split the train into train_set and valid_set. Because we may have cases where valid_set is pre-defined like below:

folder
|-- test
|-- train
|   |-- class_1
|   |   |-- 10001.png
|   |   `-- 10002.png
|   |-- class_2
|   |   |-- 10005.png
|   |   `-- 10009.png
|   `-- class_3
|       |-- 10014.png
|       `-- 10027.png
`-- valid
    |-- class_1
    |   |-- 10023.png
    |   `-- 10035.png
    |-- class_2
    |   |-- 1002.png
    |   `-- 10042.png
    `-- class_3
        |-- 10029.png
        `-- 10076.png

[ ] CSV file with image_id and bbox column (Object Detection). Similar to classification tasks, there can be two ways that are used most often in this:

image_id        bbox
100011          [834.0, 222.0, 56.0, 36.0]
100011          [226.0, 548.0, 130.0, 58.0]
100007          [377.0, 504.0, 74.0, 160.0]

Honestly, I have never seen the below format but still we can have support for this.

image_id             bbox
100011.jpg           [834.0, 222.0, 56.0, 36.0]
100011.jpg           [226.0, 548.0, 130.0, 58.0]
100007.jpg           [377.0, 504.0, 74.0, 160.0]

I have come across only the above four formats, but do let me know if I missed any. And also let me know your thoughts on the above.

cc @zhiqwang

Nov 24 '20 17:11 hassiahk

For object detection task, there are two other frequently used formats: Pascal VOC and MSCOCO, and it is supported in torchvision, I am not sure that we didn't mention this two Datasets is for we just use torchvision's implementation when we met this two datasets?

Nov 24 '20 18:11 zhiqwang

I think we should discuss more over this. Datasets is really tricky especially when it comes to object detection etc. For the Torchvision models, we expect VOC format. And for Detr, a normalized YOLO format. We haven't enforced these as these have come from models themselves.

Nov 24 '20 18:11 oke-aditya