Random Hidden and Epochs > 1
Hi EAGLE Team,
Thanks for your great work that accelerates speculative decoding up to unbelievable 4~5 times.
But I can't reproduce the result of below: Acc rate improves with more epochs.
In our expriments, more than 1 epoch on sft data results in acc rate slightly drops. Was random data augmentation a key feature to avoid such observation??
What training data did you use, and what is its size?
What training data did you use, and what is its size? We actually used our human annotated sft dataset, about 110k~240k. We also masked out the system and user prompt to train the human response only. The size of ShareGPT the paper used is about 68k.