MLDatasets.jl
MLDatasets.jl copied to clipboard
make it easy to discover and load datasets
We could provide some convenience functions like list_datasets() and `load_dataset("mnist").
Have you seen https://github.com/lorenzoh/FeatureRegistries.jl? In FastAI.jl, it's used to create such a list, make it easier to search for datasets and handle the downloading: https://fluxml.ai/FastAI.jl/dev/references/FastAI.Registries.datasets
Curiously, what's the relationship between FastAI.Datasets and MLDatasets -- are they going to duplicate each other, or will they be merged into one eventually?
FastAI.jl will allow loading datasets from MLDatasets.jl. If MLDatasets.jl wants to include a feature registry for datasets, I'd also be happy to merge additional datasets currently in FastAI.jl into MLDatasets.jl (i.e. everything here: https://course.fast.ai/datasets)
We should also consider adding license and citing information with each dataset/datahub.
How about using FeatureRegistries.jl for this?