datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Add Pascal VOC dataset

Open nateraw opened this issue 3 years ago • 2 comments

This PR adds the Pascal VOC dataset in the same way TFDS has it added. I believe we can iterate on this dataset and in future versions include more data, such as segmentation masks, but for now I think it is a good idea to just add it the same way as TFDS to get a solid first version out there.

nateraw avatar May 23 '22 16:05 nateraw

The documentation is not available anymore as the PR was closed or merged.

Some CI fails are unrelated to your PR and fixed on master, feel free to merge master into your branch :)

lhoestq avatar Jun 14 '22 16:06 lhoestq

Thanks @nateraw for the addition of this dataset.

I would suggest to transfer it to the Hugging Face Hub, under a "pascal" organization namespace: "pascal/voc".

What do you think?

albertvillanova avatar Sep 23 '22 14:09 albertvillanova

FYI I think this dataset is also available at (internal) https://huggingface.co/datasets/HuggingFaceM4/pascal_voc

lhoestq avatar Sep 26 '22 09:09 lhoestq

@lhoestq @albertvillanova what do you think best path forward is? No idea when I'll get to looking at this again, but would be nice to know plan so when I find time I can just get it done in one sitting.

nateraw avatar Sep 26 '22 19:09 nateraw

My (not strong) opinion on this:

  • as we are removing dataset scripts from GitHub, this dataset should be created directly on the Hub
  • I proposed doing it under some kind of "official" org namespace, like pascal or pascal2; other suggestions are welcome
  • the link given by @lhoestq might serve as inspiration for your implementation (I think yours misses data about action classification): their implementation comprises tasks: classification/detection, segmentation, action classification, person layout; it misses other tasks though

What do you think?

albertvillanova avatar Sep 27 '22 08:09 albertvillanova