vision icon indicating copy to clipboard operation
vision copied to clipboard

[RFC] New datasets to torchvision

Open oke-aditya opened this issue 4 years ago • 12 comments

🚀 Feature

This is a proposal to add more highly cited datasets. Thanks to papers with code datasets which made this search easy.

Motivation

These datasets are used quite frequently and would provide benefits to both researchers as well as people who work in computer vision. I'm not sure of the citation metric, but we can verify the count of papers once.

Pitch

The following datasets can be considered. Papers are reported as per the last 5 years count on papers with code. They can be inaccurate, feel free to edit. I'm also adding previously approved or proposed ones

See #5108

Probably, we should think and add these, one by one. Also support downloading, not just loading of the dataset.

Additional context

Please feel free to discuss about datasets before opening PRs!

cc @pmeier

oke-aditya avatar Mar 12 '21 17:03 oke-aditya

Hi,

This is exactly our current idea, thanks for bringing it up.

I agree with all the aforementioned proposals. One thing to mention as well is that there is an ongoing effort to provide new dataset abstractions in PyTorch via DataPipes https://github.com/pytorch/pytorch/issues/49440.

While this doesn't block us providing new datasets, it is good to keep in mind that we might in the future revisit the way we implement datasets.

fmassa avatar Mar 15 '21 14:03 fmassa

related to this issue, it can also be useful if pytorch can store this datasets on their storage and provide link to download them. e.g. there are a lot of issues with downloading imagenet and other large datasets, im not sure if licensing can be problematic, but it would be super useful

seyeeet avatar Mar 24 '21 19:03 seyeeet

@seyeeet

im not sure if licensing can be problematic

Yes, it is and thus

pytorch can store this datasets on their storage and provide link to download them

will never happen.

Also see this section in our README

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

pmeier avatar Mar 25 '21 10:03 pmeier

as per observation from "torchvision/datasets/" below datasets need to be added,please update the pitch

LFW Labeled Faces in Wild Market-1501 492 papers MPII Human Pose VGGFace2 Earlier requested in #1193 #2910 Here is tar.gz file. Hopefully we can add it MovingMNIST Perviously approved in #2676 #2690. iNaturalist #3292 LVIS

harishsdev avatar Apr 30 '21 18:04 harishsdev

Hey @harishsdev, not sure what you mean. From the original pitch only KITTI was added, which is correctly marked. In your list you left out CUB-200-2011, which is not supported yet. We do feature the Caltech(101|256) datasets, but they are not related other than coming from the same university.

pmeier avatar May 03 '21 09:05 pmeier

Hi @harishsdev, I have created a pr for LFW Dataset, can you guide me about any further changes.

ABD-01 avatar Aug 08 '21 16:08 ABD-01

The link provided for VGGFace2 is not correct; That link points to the first VGGFace dataset (which is available from this page).

jgbradley1 avatar Aug 23 '21 14:08 jgbradley1

Actually the tar.gz is down for many months. Don't know what happened to VGG Face

https://www.robots.ox.ac.uk/~vgg/data/vgg_face2

Probably this is the link https://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz

oke-aditya avatar Aug 23 '21 14:08 oke-aditya

Probably this is the link https://www.robots.ox.ac.uk/~vgg/data/vgg_face/vgg_face_dataset.tar.gz

Respectfully, that is the wrong url. The link you've provided is for the first version of VGGFace. The original pitch asked for VGGFace2, which will not be possible to provide at this time.

jgbradley1 avatar Aug 23 '21 21:08 jgbradley1

@oke-aditya can we add the SmallNORB dataset to the list as introduced in this PR: https://github.com/pytorch/vision/pull/492. Thanks in advance. :)

yassineAlouini avatar May 23 '22 08:05 yassineAlouini

@oke-aditya Should we add the FGVC-Aircraft dataset (as implemented in this PR)?

yassineAlouini avatar Jun 24 '22 09:06 yassineAlouini

@yassineAlouini We already have FGVC-Aircraft in the current API

https://github.com/pytorch/vision/blob/fb7f9a16628cb0813ac958da4525247e325cc3d2/torchvision/datasets/fgvc_aircraft.py#L12

as well as #5354 to track progress for porting it to the prototype one.

pmeier avatar Jun 27 '22 08:06 pmeier