EAGLE icon indicating copy to clipboard operation
EAGLE copied to clipboard

Inquiry about accept length results for EAGLE-Qwen2-7B-Instruct

Open zhangtia16 opened this issue 1 year ago • 9 comments

Hi EAGLE Team,

Thank you for your contributions to the community!

I downloaded the released weights for EAGLE on Qwen2-7B-Instruct from https://huggingface.co/yuhuili/EAGLE-Qwen2-7B-Instruct. However, while testing the weights on the MT-Bench dataset, I noticed that the accept length is relatively low as follows:

Model:Qwen2-7B-Instruct Dataset:MT-Bench EAGLE version: EAGLE-1

accept length tree draft chain draft
t=0.0 2.14 1.69
t=1.0 1.71 1.50

For your information, I successfully reproduced the EAGLE1-Vicuna-7B results, achieving an accept length of over 3. Additionally, I have utilized your newly released Qwen2-related codes (modeling_qwen2_kv.py) from the EAGLE-2 code branch; however, I was unable to run it successfully with the EAGLE-2 code branch, as mentioned in issue 136. Consequently, I adapted the Qwen2-related codes to the EAGLE-1 code branch for testing.

I'm curious about the low accept length I'm experiencing with EAGLE-Qwen2. I see that only the weights for EAGLE-Qwen2 were released, without accompanying results. Could you please share the accept length or any other results for EAGLE-Qwen2 on MT-Bench?

Thank you!

zhangtia16 avatar Oct 09 '24 18:10 zhangtia16

Thank you for your interest. Could you please provide more detailed error information from Qwen on the main branch?

Liyuhui-12 avatar Oct 21 '24 09:10 Liyuhui-12

As for the error on the main branch: 1.the function “initialize_tree” in utils_alpha.py returns 5 arguments, whereas the “forward” function in ea_model.py outputs only 3 arguments. 2.I noticed that the authors removed the “logits_processor” argument from the “forward” function in ea_model.py in the main branch, compared to the code branch of EAGLE-1. Could the authors please explain why this argument was deleted? I see that “logits_processor” is still being passed into the function call in evaluation/gen_ea_alpha_vicuna.py in the main branch.

zhangtia16 avatar Oct 21 '24 10:10 zhangtia16

@Liyuhui-12 @zhangtia16 Hello, can you provide the test benchmarks for EAGLE Qwen2? The alpha value I tested on the EAGLE-Qwen2-72B-Instruct model is relatively low.

  1. Add the modeling_qwen2_kv.py model file on the v1 branch.
  2. When loading the EAGLE-Qwen2-72B-Instruct model parameters, set torch_dtype=torch.bfloat16.
  3. Use the gen_ea_alpha_llama2chat.py script to test on the mt_bench dataset.
  4. Perform inference in a Chain.
  5. Obtain the alpha value through the alpha.py script.

The alpha of EAGLE-Qwen2-72B-Instruct is [0.5 0.34 0.32 0.33 0.47], and under the same conditions, the alpha of EAGLE-Vicuna-7B-v1.3 is [0.79 0.74 0.72 0.73 0.72]. I don't know if the test results of EAGLE-Qwen2-72B-Instruct can be consistent with your internal results. Thank you.

quanliu1991 avatar Nov 14 '24 11:11 quanliu1991

I have configured my setup similarly to your points 1-5 (modified v1-branch, bf16, mt-bench, chain-draft, temperature=0), with the only difference being that I am using the EAGLE-Qwen2-7B-Instruct checkpoints provided by the authors. Here are my alpha results: [0.31, 0.24, 0.25, 0.31, 0.31], corresponding with an accept length of 1.87 (already considering the +1 token issue).

zhangtia16 avatar Nov 15 '24 01:11 zhangtia16

I have configured my setup similarly to your points 1-5 (modified v1-branch, bf16, mt-bench, chain-draft, temperature=0), with the only difference being that I am using the EAGLE-Qwen2-7B-Instruct checkpoints provided by the authors. Here are my alpha results: [0.31, 0.24, 0.25, 0.31, 0.31], corresponding with an accept length of 1.87 (already considering the +1 token issue).

How is the accept length of 1.87 calculated? I use computational methods: (the total number of tokens accepted + the inference steps of the base model) / the inference steps of the base model

forward_numbers = alphas_num[0]
 accept_lengths = []
 for i in range(len(alphas)):
     accept_lengths.append((alphas_num[i] - alphas[i]) * (i + 1))

 print((sum(accept_lengths) + forward_numbers) / forward_numbers)

alphas_num and alphas are obtained from alpha.py

I did a test on EAGLE-Qwen2-7B-Instruct, and the result is as follows: chain-draft temperature=0: [0.43 0.35 0.42 0.5 0.8 ] accept length:2.55 temperature=1:[0.36 0.28 0.31 0.31 0.45] accept length:2.47 tree-draft temperature=0: [0.66 0.46 0.47 0.34 0.71] accept length:2.98 temperature=1:[0.4 0.3 0.27 0.19 0.37] accept length:2.54

quanliu1991 avatar Nov 15 '24 16:11 quanliu1991

Since the authors did not directly output the acceptance length, I modified the code to calculate it. For details on the modification, please refer to issue #146. In summary, we record the number of accepted tokens at each step for every sample. Finally, the average number of accepted tokens (first averaged across the steps of a single sample, and then averaged across all samples) represents the acceptance length of the dataset.

As for your implementation, I think the right version should be accept_lengths.append((alphas_num[i] - alphas[i]) * (i)) rather than accept_lengths.append((alphas_num[i] - alphas[i]) * (i + 1)). Take an simple example of a [right,wrong,wrong,wrong,wrong] chain draft, the accept length should be 1 while your codes produce 2 with alpha=[1,0,0,0,0] and alpha_num=[1,1,0,0,0].

Btw, did you used the released checkpoints on MT-bench to get the alpha results?

zhangtia16 avatar Nov 16 '24 07:11 zhangtia16

@zhangtia16 You are correct. I modified accept length method and got new results for the EAGLE-Qwen2-7B-Instruct model, which are generally consistent with yours.

chain-draft temperature=0: accept length:1.70 temperature=1:accept length:1.50 tree-draft temperature=0: accept length:2.19 temperature=1:accept length:1.55

Here is the new calculation method for the accept length:

forward_numbers = alphas_num[0]
accept_lengths = []
for i in range(len(alphas)):
    accept_lengths.append((alphas_num[i] - alphas[i]) * (i))
accept_lengths.append(alphas[4] * 5)

print((sum(accept_lengths) + forward_numbers) / forward_numbers)

The checkpoints used are those published by the author at https://huggingface.co/yuhuili/EAGLE-Qwen2-7B-Instruct.

If our testing results are correct, the model's performance does not appear to surpass that of the Vicuna and Llama models released by the author.

quanliu1991 avatar Nov 17 '24 10:11 quanliu1991

Any updates on this topic? Is such behavior normal? Qwen2 much slower than Vicuna and Llama?

Ageliss avatar Feb 07 '25 15:02 Ageliss

+1 encounter same issue, qwen2.5 coder 7b seems no speed up using https://huggingface.co/yuhuili/EAGLE-Qwen2-7B-Instruct as draft model. Running on a100. Vicuna series can speed up 3x.

FWXT avatar Feb 22 '25 11:02 FWXT