mirdata
mirdata copied to clipboard
Download for very large datasets should be by instruction
For very large datasets (e.g. > 20 GB), command line download does not make sense. In this case, we should instead give download instructions (or possibly a standalone download script?), like we do for unavailable datasets.
TODO - check our existing datasets, and update contributing instructions to add a non-standard case.
I agree 10%% for datasets that have super large files, but I don't see a problem if the dataset is composed of multiple small files (~2GB) ? In that case if something fails the user can download only the remaining files with partial_download
right?
I don't see a problem if the dataset is composed of multiple small files (~2GB) ? In that case if something fails the user can download only the remaining files with partial_download right?
Yep, agree!
ok so I guess some existing datasets should have downloading disabled and we should print the instructions when download is called?
- datacos
- mtg-jamendo
- AcousticBrainz which other ones are problematic?
after looking into the code:
- datacos has multiple parts
- mtg-jamendo does not have the download enabled
- AcousticBrainz has multiple parts