datasets icon indicating copy to clipboard operation
datasets copied to clipboard

contribute data loading for object detection datasets with yolo data format

Open faizankshaikh opened this issue 2 years ago • 4 comments

Is your feature request related to a problem? Please describe. At the moment, HF datasets loads image classification datasets out-of-the-box. There could be a data loader for loading standard object detection datasets (original discussion here)

Describe the solution you'd like I wrote a custom script to load dataset which has YOLO data format.

Describe alternatives you've considered The script can either be a standalone dataset builder, or a modified version of ImageFolder

Additional context I would be happy to contribute to this, but I would do it at a very slow pace (maybe a month or two) as I have my exams approaching 😄

faizankshaikh avatar Jul 02 '22 15:07 faizankshaikh

Hi! The imagefolder script is already quite complex, so a standalone script sounds better. Also, I suggest we create an org on the Hub (e.g. hf-loaders) and store such scripts there for easier maintenance rather than having them as packaged modules (IMO only very generic loaders should be packaged). WDYT @lhoestq @albertvillanova @polinaeterna?

mariosasko avatar Jul 12 '22 17:07 mariosasko

@mariosasko sounds good to me!

polinaeterna avatar Jul 13 '22 09:07 polinaeterna

Thank you for the suggestion @mariosasko . I agree with the point, but I have a few doubts

  1. How would the user access the script if it's not a part of the core codebase?
  2. Could you direct me as to what will be the tasks I have to do to contribute to the code? As per my understanding, it would be like
    1. Create a new org "hf-loaders" and add you (and more HF people) to the org
    2. Add data loader script as a (model?)
    3. Test it with a dataset on HF hub
  3. We should maybe brainstorm as to which public datasets have this format (YOLO type) and are the most important ones to test the script with. We can even add the datasets on HF Hub alongside the script

faizankshaikh avatar Jul 14 '22 09:07 faizankshaikh

  1. Like this: load_dataset("hf-loaders/yolo", data_files=...)
  2. The steps would be:
    1. Create a new org hf-community-loaders (IMO a better name than "hf-loaders") and add me (as an admin)
    2. Create a new dataset repo yolo and add the loading script to it (yolo.py)
    3. Open a discussion to request our review
  3. I like this idea. Another option is to add snippets that describe how to load such datasets using the yolo loader.

mariosasko avatar Jul 21 '22 14:07 mariosasko