datasets
datasets copied to clipboard
contribute data loading for object detection datasets with yolo data format
Is your feature request related to a problem? Please describe. At the moment, HF datasets loads image classification datasets out-of-the-box. There could be a data loader for loading standard object detection datasets (original discussion here)
Describe the solution you'd like I wrote a custom script to load dataset which has YOLO data format.
Describe alternatives you've considered
The script can either be a standalone dataset builder, or a modified version of ImageFolder
Additional context I would be happy to contribute to this, but I would do it at a very slow pace (maybe a month or two) as I have my exams approaching 😄
Hi! The imagefolder
script is already quite complex, so a standalone script sounds better. Also, I suggest we create an org on the Hub (e.g. hf-loaders
) and store such scripts there for easier maintenance rather than having them as packaged modules (IMO only very generic loaders should be packaged). WDYT @lhoestq @albertvillanova @polinaeterna?
@mariosasko sounds good to me!
Thank you for the suggestion @mariosasko . I agree with the point, but I have a few doubts
- How would the user access the script if it's not a part of the core codebase?
- Could you direct me as to what will be the tasks I have to do to contribute to the code? As per my understanding, it would be like
- Create a new org "hf-loaders" and add you (and more HF people) to the org
- Add data loader script as a (model?)
- Test it with a dataset on HF hub
- We should maybe brainstorm as to which public datasets have this format (YOLO type) and are the most important ones to test the script with. We can even add the datasets on HF Hub alongside the script
- Like this:
load_dataset("hf-loaders/yolo", data_files=...)
- The steps would be:
- Create a new org
hf-community-loaders
(IMO a better name than "hf-loaders") and add me (as an admin) - Create a new dataset repo
yolo
and add the loading script to it (yolo.py
) - Open a discussion to request our review
- Create a new org
- I like this idea. Another option is to add snippets that describe how to load such datasets using the
yolo
loader.