Inquiry about accept length results for EAGLE-Qwen2-7B-Instruct
Hi EAGLE Team,
Thank you for your contributions to the community!
I downloaded the released weights for EAGLE on Qwen2-7B-Instruct from https://huggingface.co/yuhuili/EAGLE-Qwen2-7B-Instruct. However, while testing the weights on the MT-Bench dataset, I noticed that the accept length is relatively low as follows:
Model:Qwen2-7B-Instruct Dataset:MT-Bench EAGLE version: EAGLE-1
| accept length | tree draft | chain draft |
|---|---|---|
| t=0.0 | 2.14 | 1.69 |
| t=1.0 | 1.71 | 1.50 |
For your information, I successfully reproduced the EAGLE1-Vicuna-7B results, achieving an accept length of over 3. Additionally, I have utilized your newly released Qwen2-related codes (modeling_qwen2_kv.py) from the EAGLE-2 code branch; however, I was unable to run it successfully with the EAGLE-2 code branch, as mentioned in issue 136. Consequently, I adapted the Qwen2-related codes to the EAGLE-1 code branch for testing.
I'm curious about the low accept length I'm experiencing with EAGLE-Qwen2. I see that only the weights for EAGLE-Qwen2 were released, without accompanying results. Could you please share the accept length or any other results for EAGLE-Qwen2 on MT-Bench?
Thank you!
Thank you for your interest. Could you please provide more detailed error information from Qwen on the main branch?
As for the error on the main branch: 1.the function “initialize_tree” in utils_alpha.py returns 5 arguments, whereas the “forward” function in ea_model.py outputs only 3 arguments. 2.I noticed that the authors removed the “logits_processor” argument from the “forward” function in ea_model.py in the main branch, compared to the code branch of EAGLE-1. Could the authors please explain why this argument was deleted? I see that “logits_processor” is still being passed into the function call in evaluation/gen_ea_alpha_vicuna.py in the main branch.
@Liyuhui-12 @zhangtia16 Hello, can you provide the test benchmarks for EAGLE Qwen2? The alpha value I tested on the EAGLE-Qwen2-72B-Instruct model is relatively low.
- Add the modeling_qwen2_kv.py model file on the v1 branch.
- When loading the EAGLE-Qwen2-72B-Instruct model parameters, set torch_dtype=torch.bfloat16.
- Use the gen_ea_alpha_llama2chat.py script to test on the mt_bench dataset.
- Perform inference in a Chain.
- Obtain the alpha value through the alpha.py script.
The alpha of EAGLE-Qwen2-72B-Instruct is [0.5 0.34 0.32 0.33 0.47], and under the same conditions, the alpha of EAGLE-Vicuna-7B-v1.3 is [0.79 0.74 0.72 0.73 0.72]. I don't know if the test results of EAGLE-Qwen2-72B-Instruct can be consistent with your internal results. Thank you.
I have configured my setup similarly to your points 1-5 (modified v1-branch, bf16, mt-bench, chain-draft, temperature=0), with the only difference being that I am using the EAGLE-Qwen2-7B-Instruct checkpoints provided by the authors. Here are my alpha results: [0.31, 0.24, 0.25, 0.31, 0.31], corresponding with an accept length of 1.87 (already considering the +1 token issue).
I have configured my setup similarly to your points 1-5 (modified v1-branch, bf16, mt-bench, chain-draft, temperature=0), with the only difference being that I am using the EAGLE-Qwen2-7B-Instruct checkpoints provided by the authors. Here are my alpha results: [0.31, 0.24, 0.25, 0.31, 0.31], corresponding with an accept length of 1.87 (already considering the +1 token issue).
How is the accept length of 1.87 calculated? I use computational methods: (the total number of tokens accepted + the inference steps of the base model) / the inference steps of the base model
forward_numbers = alphas_num[0]
accept_lengths = []
for i in range(len(alphas)):
accept_lengths.append((alphas_num[i] - alphas[i]) * (i + 1))
print((sum(accept_lengths) + forward_numbers) / forward_numbers)
alphas_num and alphas are obtained from alpha.py
I did a test on EAGLE-Qwen2-7B-Instruct, and the result is as follows: chain-draft temperature=0: [0.43 0.35 0.42 0.5 0.8 ] accept length:2.55 temperature=1:[0.36 0.28 0.31 0.31 0.45] accept length:2.47 tree-draft temperature=0: [0.66 0.46 0.47 0.34 0.71] accept length:2.98 temperature=1:[0.4 0.3 0.27 0.19 0.37] accept length:2.54
Since the authors did not directly output the acceptance length, I modified the code to calculate it. For details on the modification, please refer to issue #146. In summary, we record the number of accepted tokens at each step for every sample. Finally, the average number of accepted tokens (first averaged across the steps of a single sample, and then averaged across all samples) represents the acceptance length of the dataset.
As for your implementation, I think the right version should be accept_lengths.append((alphas_num[i] - alphas[i]) * (i)) rather than accept_lengths.append((alphas_num[i] - alphas[i]) * (i + 1)). Take an simple example of a [right,wrong,wrong,wrong,wrong] chain draft, the accept length should be 1 while your codes produce 2 with alpha=[1,0,0,0,0] and alpha_num=[1,1,0,0,0].
Btw, did you used the released checkpoints on MT-bench to get the alpha results?
@zhangtia16 You are correct. I modified accept length method and got new results for the EAGLE-Qwen2-7B-Instruct model, which are generally consistent with yours.
chain-draft temperature=0: accept length:1.70 temperature=1:accept length:1.50 tree-draft temperature=0: accept length:2.19 temperature=1:accept length:1.55
Here is the new calculation method for the accept length:
forward_numbers = alphas_num[0]
accept_lengths = []
for i in range(len(alphas)):
accept_lengths.append((alphas_num[i] - alphas[i]) * (i))
accept_lengths.append(alphas[4] * 5)
print((sum(accept_lengths) + forward_numbers) / forward_numbers)
The checkpoints used are those published by the author at https://huggingface.co/yuhuili/EAGLE-Qwen2-7B-Instruct.
If our testing results are correct, the model's performance does not appear to surpass that of the Vicuna and Llama models released by the author.
Any updates on this topic? Is such behavior normal? Qwen2 much slower than Vicuna and Llama?
+1 encounter same issue, qwen2.5 coder 7b seems no speed up using https://huggingface.co/yuhuili/EAGLE-Qwen2-7B-Instruct as draft model. Running on a100. Vicuna series can speed up 3x.