llama-recipes
llama-recipes copied to clipboard
Reasoning behind Alapca's default split
For the Alpaca dataset, the default split comprises 51,800 samples for training and 200 samples for testing [1]. What is the rationale behind such a small test set? I haven't been able to find any recommended split ratios for the Alpaca dataset.
Is the purpose of the small test set merely to serve as a reference point, suggesting that for more reliable testing, another dataset or framework, such as HELM, should be utilized?
[1] https://github.com/facebookresearch/llama-recipes/blob/main/src/llama_recipes/datasets/alpaca_dataset.py#L30
@macsz I believe it was more of quick test rather than a recommended setting, agree with you it should be higher, will consider adding a fix.
Hi! This issue has been fixed by this PR. Now we use 5% of the dataset as eval-set. Let me know if you have any questions!
Thanks!