super-gradients icon indicating copy to clipboard operation
super-gradients copied to clipboard

Feature Request: configure 'dataset_params' for the training/validation/test data in multiple directories.

Open PraveenKumar-Rajendran opened this issue 1 year ago • 2 comments

Thank you for the awesome work! :)

Is your feature request related to a problem? Please describe.

The train/val/test split is not always stored in a single directory. It would be nice to give multiple directories and their corresponding labels for a single split ( ex. train )

Describe the solution you'd like

For example in YoloV5/V8, one can give multiple directories for the single split in the .yaml file.

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/GlobalWheat2020  # dataset root dir
train: # train images (relative to 'path') 3422 images
  - images/arvalis_1
  - images/arvalis_2
  - images/arvalis_3
  - images/ethz_1
  - images/rres_1
  - images/inrae_1
  - images/usask_1
val: # val images (relative to 'path') 748 images (WARNING: train set contains ethz_1)
  - images/ethz_1
test: # test images (optional) 1276 images
  - images/utokyo_1
  - images/utokyo_2
  - images/nau_1
  - images/uq_1

Additional context

YoloV5/V8 assumes that the labels directory is in the same directory as the images.

images/
labels/

If this assumption is not used in YoloNAS how about just using the order in the list?

Example:

dataset_params = {
    'data_dir':'/data/od',
    'train_images_dir':['a/train/images', 'b/train/images', 'c/train/images', 'd/train/images'],
    'train_labels_dir':['a/train/labels', 'b/train/labels', 'c/train/labels', 'd/train/labels'],
    'val_images_dir':['a/val/images', 'b/val/images', 'c/val/images', 'd/val/images'],
    'val_labels_dir':['a/val/labels', 'b/val/labels', 'c/val/labels', 'd/val/labels'],
    'test_images_dir':'test/images',
    'test_labels_dir':'test/labels',
    'classes': ['apple', 'orange', 'grapes', 'mango', 'banana']
}

PraveenKumar-Rajendran avatar May 17 '23 06:05 PraveenKumar-Rajendran

Thank you for the awesome work! :)

Is your feature request related to a problem? Please describe.

The train/val/test split is not always stored in a single directory. It would be nice to give multiple directories and their corresponding labels for a single split ( ex. train )

Describe the solution you'd like

For example in YoloV5/V8, one can give multiple directories for the single split in the .yaml file.

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/GlobalWheat2020  # dataset root dir
train: # train images (relative to 'path') 3422 images
  - images/arvalis_1
  - images/arvalis_2
  - images/arvalis_3
  - images/ethz_1
  - images/rres_1
  - images/inrae_1
  - images/usask_1
val: # val images (relative to 'path') 748 images (WARNING: train set contains ethz_1)
  - images/ethz_1
test: # test images (optional) 1276 images
  - images/utokyo_1
  - images/utokyo_2
  - images/nau_1
  - images/uq_1

Additional context

YoloV5/V8 assumes that the labels directory is in the same directory as the images.

images/
labels/

If this assumption is not used in YoloNAS how about just using the order in the list?

Example:

dataset_params = {
    'data_dir':'/data/od',
    'train_images_dir':['a/train/images', 'b/train/images', 'c/train/images', 'd/train/images'],
    'train_labels_dir':['a/train/labels', 'b/train/labels', 'c/train/labels', 'd/train/labels'],
    'val_images_dir':['a/val/images', 'b/val/images', 'c/val/images', 'd/val/images'],
    'val_labels_dir':['a/val/labels', 'b/val/labels', 'c/val/labels', 'd/val/labels'],
    'test_images_dir':'test/images',
    'test_labels_dir':'test/labels',
    'classes': ['apple', 'orange', 'grapes', 'mango', 'banana']
}

have you solved problem? I have same your problem.

itachi176 avatar Jun 02 '23 10:06 itachi176

We do not support this scenario out of the box. I think what you will have to do is combine create multiple datasets and concat then using ConcatDataset from pytorch and then pass it to DataLoader.

BloodAxe avatar Aug 10 '23 15:08 BloodAxe