llama-recipes Reasoning behind Alapca's default split

Reasoning behind Alapca's default split

Open macsz opened this issue 1 year ago • 1 comments

For the Alpaca dataset, the default split comprises 51,800 samples for training and 200 samples for testing [1]. What is the rationale behind such a small test set? I haven't been able to find any recommended split ratios for the Alpaca dataset.

Is the purpose of the small test set merely to serve as a reference point, suggesting that for more reliable testing, another dataset or framework, such as HELM, should be utilized?

[1] https://github.com/facebookresearch/llama-recipes/blob/main/src/llama_recipes/datasets/alpaca_dataset.py#L30

Feb 01 '24 02:02 macsz

@macsz I believe it was more of quick test rather than a recommended setting, agree with you it should be higher, will consider adding a fix.

Feb 26 '24 00:02 HamidShojanazeri

Hi! This issue has been fixed by this PR. Now we use 5% of the dataset as eval-set. Let me know if you have any questions!

Jun 03 '24 21:06 wukaixingxp

Thanks!

Jun 03 '24 22:06 macsz

llama-recipes llama-recipes copied to clipboard

Reasoning behind Alapca's default split

llama-recipes
llama-recipes copied to clipboard