GUO-QING JIANG comments

Results 18 comments of


                                            GUO-QING JIANG

Inquiry about accept length results for EAGLE-Qwen2-7B-Instruct

Any updates on this topic? Is such behavior normal? Qwen2 much slower than Vicuna and Llama?

How to train eagle3 with the new loss?

> my guess is the EAGLE 3 is using same training method as [HASS](https://github.com/HArmonizedSS/HASS) except > > 1. As stated in v3 paper, EAGLE does not have hidden states distillation...

How to train eagle3 with the new loss?

> have you get the acceptance rate increased by replacing the last layer hidden extraction to the [2nd, mid, len-2] layer of hidden status along? I observe worsen training time...

How to train eagle3 with the new loss?

@carlbunny > Can you elaborate more on using input_ids for train time test? For prediction the second token, you are not using âₜ₊₁ but the corresponding next token in the...

How to train eagle3 with the new loss?

> > We only get benefit on acceptance rate using the hidden fusion (about 5 -> 5.7), hard to get benefit from train time test (cannot reproduce >6.5). > >...

[Feature] Create a standard, balanced, robust multi-domain training dataset

Comments: In my experiments, scaling training data will get log scaling law on the accept rate both on pretrain data (Fig.1a) and SFT data (Tabel.9 Scylla+8SFT means 8X sft data)....

[Feature] Create a standard, balanced, robust multi-domain training dataset

> [@Ageliss](https://github.com/Ageliss) Awesome paper on scaling law on spec decoding!! But I still have some questions in the paper, which only used EAGLE2 configuration, and exclude EAGLE3 train-time test +...