datasets
datasets copied to clipboard
Add Pascal VOC dataset
This PR adds the Pascal VOC dataset in the same way TFDS has it added. I believe we can iterate on this dataset and in future versions include more data, such as segmentation masks, but for now I think it is a good idea to just add it the same way as TFDS to get a solid first version out there.
The documentation is not available anymore as the PR was closed or merged.
Some CI fails are unrelated to your PR and fixed on master, feel free to merge master into your branch :)
Thanks @nateraw for the addition of this dataset.
I would suggest to transfer it to the Hugging Face Hub, under a "pascal" organization namespace: "pascal/voc".
What do you think?
FYI I think this dataset is also available at (internal) https://huggingface.co/datasets/HuggingFaceM4/pascal_voc
@lhoestq @albertvillanova what do you think best path forward is? No idea when I'll get to looking at this again, but would be nice to know plan so when I find time I can just get it done in one sitting.
My (not strong) opinion on this:
- as we are removing dataset scripts from GitHub, this dataset should be created directly on the Hub
- I proposed doing it under some kind of "official" org namespace, like pascal or pascal2; other suggestions are welcome
- the link given by @lhoestq might serve as inspiration for your implementation (I think yours misses data about action classification): their implementation comprises tasks: classification/detection, segmentation, action classification, person layout; it misses other tasks though
What do you think?