algorithmic-efficiency
algorithmic-efficiency copied to clipboard
Add dataset setup tests
Description
Most of the code in data_setup.py is untested. There are a few challenges for these tests:
- datasets are very large (total just under 2TB total I believe)
- some of them require manual steps (getting the links after signing the user agreements etc, I don't think we can check in the urls). We can at a minimum test the datasets that are downloaded via tfds (ogbg and wmt) and add some unit tests for the other datasets.