Huang Haiduo
Huang Haiduo
Generally speaking, there is not enough GPU memory.
Hi @hongyanz @yanjunplay , Then may I ask, how is the feature & unshifted-token scheme implemented? From Figure 8 in Eagle1, it seems that feature & shifted-token achieves significant improvements...
> Hi, thanks for you great work! When I used the EAGLE-llama2-chat-7B you provided for testing, the average acceptance length I measured was lower than the value in the paper....
> I've implemented my understanding of training-time-test here: https://github.com/NickL77/BaldEagle?tab=readme-ov-file#eagle-3-status > > It performs 11.7% faster and has 8.4% higher acceptance rate than my Eagle 2 baseline. You can see benchmark...
@SaeedNajafi Maybe you can give this https://github.com/haiduo/Jakiro .
Maybe see https://huggingface.co/datasets/AJN-AI/VoQA/tree/main/test
Hi @jiapingW , Thanks for your response. Yes, I admit that TTT is used to align inference with training, mainly to reduce exposure bias—I’ve already [discussed that here](https://github.com/SafeAILab/EAGLE/issues/194#issuecomment-3273813878). However, the...
Hi @yubofredwang , Although this simple shifting operation is not a theoretical implementation of "TTT," it should not affect the practical effectiveness. On the contrary, it may even facilitate learning,...
> [@Ageliss](https://github.com/Ageliss) Awesome paper on scaling law on spec decoding!! But I still have some questions in the paper, which only used EAGLE2 configuration, and exclude EAGLE3 train-time test +...
> Thanks for the reply!! Seems like the norm layer plays a critical role in the scaling! It is also mentioned at this [issue](https://github.com/SafeAILab/EAGLE/issues/220) in EAGLE repo Yes, I also...