Impossible to only download a test split
I've spent a significant amount of time trying to locate the split object inside my _split_generators() custom function.
Then after diving in the code I realized that download_and_prepare is executed before! split is passed to the dataset builder in as_dataset.
If I'm not missing something, this seems like bad design, for the following use case:
Imagine there is a huge dataset that has an evaluation test set and you want to just download and run just to compare your method.
Is there a current workaround that can help me achieve the same result?
Thank you,
The only way right now is to load with streaming=True
This feature has been proposed for a long time. I'm looking forward to the implementation. On clusters streaming=True is not an option since we do not have Internet on compute nodes. See: https://github.com/huggingface/datasets/discussions/1896#discussioncomment-2359593