MLDatasets.jl
MLDatasets.jl copied to clipboard
Utility package for accessing common Machine Learning datasets in Julia
https://zenodo.org/record/3242143/ size is big-ish, 4.4GB, is this kind of dataset welcomed to be added? or we should only have minimal representative dataset here
Add ImageNet
Draft PR to add the ImageNet 2012 Classification Dataset (ILSVRC 2012-2017) as a `ManualDataDep`. Closes #100. ___ Since ImageNet is very large (>150 GB) and [requires signing up and accepting...
Closes #155
Currently, the SMSSpam collection cannot be downloaded in CI environments, preventing it from testing. For debug log, check [this run](https://github.com/JuliaML/MLDatasets.jl/runs/7524820232) Removing it from the CI testable datasets until the issue...
The package BinaryProvider is old and not well supported on the M1 platform, causing problems when installing this package.
## Problems 1. Splitting graphs is a bit more complicated than normal data. Graphs can be split based on node, edge or whole graph itself. We should be able to...
Required for [GraphNeuralNetworks.jl #173](https://github.com/CarloLucibello/GraphNeuralNetworks.jl/issues/173) [Download Link](http://faust.is.tue.mpg.de/challenge/Inter-subject_challenge/datasets). User needs to sign in before downloading the dataset.
Some of the features of the OGBDataset are downloaded as torch tensor stored in the ".pt" format. They are currently ignored at the moment, but we could load them using...
There is a lot of output when testing OGBDataset and SMSCollection, [see this CI run](https://github.com/JuliaML/MLDatasets.jl/runs/7156072504?check_suite_focus=true#step:5:611). Maybe we can suppress these outputs although I'm not sure why we don't see this...
We could provide some convenience functions like `list_datasets()` and `load_dataset("mnist").