Switch all datasets to be using MLDatasets.jl
Following up on https://github.com/FluxML/Flux.jl/issues/580 It would be good to get the model zoo off of depending on Flux for datasets, and instead just dependinging on MLDatasets.jl Once that is done, the datasets in Flux can be deprecated.
This is a follow up to #15 in part, now the MLDatasets has matured and been registered.
For reference re: surpressing the accept downlad prompt mentined in https://github.com/FluxML/model-zoo/issues/15#issuecomment-355924217
this is done by setting ENV["DATADEPS_ALWAYS_ACCEPT"]="true".
The prompt is still not great in Juno. If we're going to, in effect, turn it off everywhere that may as well happen in MLDatasets itself rather than every notebook, and perhaps warn people in the readme that the data may have conditions. At least I'd prefer a keyword argument or something to the environment variable, which won't be familiar to everyone.
In MLDatasets it is avaialbled with the kwarg i_accept_the_terms_of_use = true.
(I forgot that this was one of the features MLDatasets adds on top of DataDeps. Possibly it should be moved to DataDeps.jl at somepoint. Probably in the overhall when we stop using string macros and switch to AbstractPath types)
For context: I don't like the idea of making it too easy to bypass the prompt, One of the aims of DataDeps is to ensure credit is given where credit is due. For academic ethics. It is less terms and conditions and more making sure people know what is what.
As a aside, it would be good to fix readline in Juno. In notebooks, IJulia handles it basically perfectly. As does the commandline of course. Feels like juno should idk, type-pirate readline and open a dialog box showing the last few lines of stdout and a input box or something.
If that's the goal, how about an @info when downloading? In the unlikely event that someone then disagrees with the terms they can then delete the data.
If that's the goal, how about an @info when downloading? In the unlikely event that someone then disagrees with the terms they can then delete the data.
Perhaps. I can't recall if that is what will happen anyway if you have the kwarg or the env var set. Worth thinking about for after overhaulling DataDeps.
But anyway, I think the i_accept_the_terms_of_use = true kwarg meets needs?
Sure, I guess that would do the job.
cc @Evizero
An info is not a call to action. I really believe that an explicit scary looking "I agree" is needed. Then again if a downstream package provider ends up setting the flag and this thus hides the terms from the actual enduser, then maybe we need to rethink it a little
Speaking of MLDatasets integration. Is there anything still needed on the MLDatasets side of thing to make this switch work conveniently? At first I thought we still have the MAT dependency issue but those big dependencies are already optional now (which i completely forgot someone somehow somewhere implemented at some point)
What makes data sets fundamentally different from, say, Julia packages (for which licensing and credit issues also apply)?
Its different from packages in that it escapes julia's package manager and just goes into the internet and dowloads stuff. That said though, something that is quite compareable in that aspect are binary dependencies.
Yet what makes it different to binary dependencies is that installing and using MLDatasets doesn't mean you automatically subscribe to all available datasets it can provide. You subscribe to a dataset when you call a dataset-specific function (e.g. traindata). MLDatasets gives you the opportunity to work offline with datasets by telling it where you yourself dowloaded it. If you already have it somewhere then you made the concious decision to download and use it at an earlier point (thus no prompt). The automatic downloading is a convenience feature and it potentially triggers a big download that a user may not be ok with. It also basically says "hey, you are escaping the realm of what MLDatasets is itself licensed under and am requesting something external from the internet. Are you sure you want that?"