MLDatasets.jl icon indicating copy to clipboard operation
MLDatasets.jl copied to clipboard

make it easy to discover and load datasets

Open CarloLucibello opened this issue 3 years ago • 5 comments

We could provide some convenience functions like list_datasets() and `load_dataset("mnist").

CarloLucibello avatar May 20 '22 08:05 CarloLucibello

Have you seen https://github.com/lorenzoh/FeatureRegistries.jl? In FastAI.jl, it's used to create such a list, make it easier to search for datasets and handle the downloading: https://fluxml.ai/FastAI.jl/dev/references/FastAI.Registries.datasets

lorenzoh avatar May 20 '22 10:05 lorenzoh

Curiously, what's the relationship between FastAI.Datasets and MLDatasets -- are they going to duplicate each other, or will they be merged into one eventually?

johnnychen94 avatar May 20 '22 10:05 johnnychen94

FastAI.jl will allow loading datasets from MLDatasets.jl. If MLDatasets.jl wants to include a feature registry for datasets, I'd also be happy to merge additional datasets currently in FastAI.jl into MLDatasets.jl (i.e. everything here: https://course.fast.ai/datasets)

lorenzoh avatar May 20 '22 10:05 lorenzoh

We should also consider adding license and citing information with each dataset/datahub.

Dsantra92 avatar Jun 28 '22 18:06 Dsantra92

How about using FeatureRegistries.jl for this?

lorenzoh avatar Jun 29 '22 17:06 lorenzoh