MLDatasets.jl
MLDatasets.jl copied to clipboard
Add FastAI datasets
This adds datadeps of all datasets from the FastAI dataset collection. As proposed in DLDatasets.jl#1.
The basic functionality:
using MLDatasets.FastAIDatasets
using MLDatasets.FastAIDatasets:
datasetpath, # download dataset and get directory
DATASETS, # list of all datasets
loaddataclassification, # load an image classification dataset into a data container with observations `(image, class)`
loaddatasegmentation, # load an image segmentation dataset into a data container with observations `(image, mask)`
What's the reason for having loaddatasegmentation
/ loaddataclassification
and not just loaddata
?
They're supposed to work on any folder containing the dataset in the right format, not just the included datasets. Also some datasets can be used for multiple different tasks, so there is no 1-to-1 mapping.
Is this PR still desired/feasible?
I will leave the decision on whether this is desirable to more active maintainers of this repository, but will note that implementation-wise a lot of things have changed since this PR was first opened. The largest change is that LearnBase.jl+MLDataPattern.jl have been superseded by MLUtils.jl. From the FastAI.jl side, I am always happy to take stuff out and move it into a more canonical package, as would be the case with these datasets.
I would be in favor of moving these datasets here, provided we manage to make the interface consistent with the other datasets here. I don't know how hard that would be.