MLDatasets.jl icon indicating copy to clipboard operation
MLDatasets.jl copied to clipboard

Utility package for accessing common Machine Learning datasets in Julia

Results 63 MLDatasets.jl issues
Sort by recently updated
recently updated
newest added

https://zenodo.org/record/3242143/ size is big-ish, 4.4GB, is this kind of dataset welcomed to be added? or we should only have minimal representative dataset here

Draft PR to add the ImageNet 2012 Classification Dataset (ILSVRC 2012-2017) as a `ManualDataDep`. Closes #100. ___ Since ImageNet is very large (>150 GB) and [requires signing up and accepting...

Currently, the SMSSpam collection cannot be downloaded in CI environments, preventing it from testing. For debug log, check [this run](https://github.com/JuliaML/MLDatasets.jl/runs/7524820232) Removing it from the CI testable datasets until the issue...

The package BinaryProvider is old and not well supported on the M1 platform, causing problems when installing this package.

## Problems 1. Splitting graphs is a bit more complicated than normal data. Graphs can be split based on node, edge or whole graph itself. We should be able to...

gsoc

Required for [GraphNeuralNetworks.jl #173](https://github.com/CarloLucibello/GraphNeuralNetworks.jl/issues/173) [Download Link](http://faust.is.tue.mpg.de/challenge/Inter-subject_challenge/datasets). User needs to sign in before downloading the dataset.

gsoc

Some of the features of the OGBDataset are downloaded as torch tensor stored in the ".pt" format. They are currently ignored at the moment, but we could load them using...

gsoc

There is a lot of output when testing OGBDataset and SMSCollection, [see this CI run](https://github.com/JuliaML/MLDatasets.jl/runs/7156072504?check_suite_focus=true#step:5:611). Maybe we can suppress these outputs although I'm not sure why we don't see this...

We could provide some convenience functions like `list_datasets()` and `load_dataset("mnist").