MLDatasets.jl
MLDatasets.jl copied to clipboard
Feature request: ImageNet data loader
ImageNet is quite large and locked behind terms of access that require an account.
However it would be nice to be able to either
- set a config (or
ENV
) variable to download ImageNet through MLDatasets - point MLDatasets to a local copy of ImageNet
and be able to use MLDatasets' interface of
train_x, train_y = ImageNet.traindata()
test_x, test_y = ImageNet.testdata()
as well as ImageNet.convert2image(x)
.
Ideally data would be in WHCN format for Flux and Metalhead models.
As a reference, an example of ImageNet usage https://github.com/avik-pal/Lux.jl/tree/main/examples/ImageNet
For reference, a ManualDataDep
may be useful for when a dataset requires the user to perform some manual steps.
Thanks for the pointers, I will open a draft PR for this!
Since there is not only one version of ImageNet, I propose to mirror PyTorch and have MLDatasets.ImageNet
refer to the ImageNet 2012 Classification Dataset (ILSVRC 2012-2017). The ImageNet authors themselves call it "the most highly-used subset of ImageNet".