Random Hidden and Epochs > 1

Open Ageliss opened this issue 1 year ago • 2 comments

Hi EAGLE Team,

Thanks for your great work that accelerates speculative decoding up to unbelievable 4~5 times. But I can't reproduce the result of below: Acc rate improves with more epochs.

In our expriments, more than 1 epoch on sft data results in acc rate slightly drops. Was random data augmentation a key feature to avoid such observation??

Oct 24 '24 08:10 Ageliss

What training data did you use, and what is its size?

Nov 04 '24 13:11 Liyuhui-12

What training data did you use, and what is its size? We actually used our human annotated sft dataset, about 110k~240k. We also masked out the system and user prompt to train the human response only. The size of ShareGPT the paper used is about 68k.

Nov 05 '24 03:11 Ageliss