MLDatasets.jl icon indicating copy to clipboard operation
MLDatasets.jl copied to clipboard

Feature request: ImageNet data loader

Open adrhill opened this issue 2 years ago • 3 comments

ImageNet is quite large and locked behind terms of access that require an account.

However it would be nice to be able to either

  • set a config (or ENV) variable to download ImageNet through MLDatasets
  • point MLDatasets to a local copy of ImageNet

and be able to use MLDatasets' interface of

train_x, train_y = ImageNet.traindata()
test_x,  test_y  = ImageNet.testdata()

as well as ImageNet.convert2image(x). Ideally data would be in WHCN format for Flux and Metalhead models.

adrhill avatar Mar 22 '22 13:03 adrhill

As a reference, an example of ImageNet usage https://github.com/avik-pal/Lux.jl/tree/main/examples/ImageNet

CarloLucibello avatar May 04 '22 08:05 CarloLucibello

For reference, a ManualDataDep may be useful for when a dataset requires the user to perform some manual steps.

lorenzoh avatar May 04 '22 09:05 lorenzoh

Thanks for the pointers, I will open a draft PR for this!

Since there is not only one version of ImageNet, I propose to mirror PyTorch and have MLDatasets.ImageNet refer to the ImageNet 2012 Classification Dataset (ILSVRC 2012-2017). The ImageNet authors themselves call it "the most highly-used subset of ImageNet".

adrhill avatar Jun 21 '22 15:06 adrhill