VLLM does not support EAGLE Spec Decode when deploying EAGLE-Qwen2-7B-Instruct model

Open crownz248 opened this issue 1 year ago • 1 comments

I can successfully deploy llama3-8b-instruct with EAGLE. But there is a problem when deploying qwen2-7b-instruct with EAGLE.

I have converted the EAGLE-Qwen2-7B-Instruct model according tovllm/model_executor/models/eagle.py:L126.

I encountered another error below:

AssertionError: Attempted to load weight (torch.Size([3584])) into parameter (torch.Size([3584, 7168])) I lookup to the code vllm/model_executor/models/eagle.py:L139 which is shown as below:

def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):
            ...
            elif name.startswith("fc."):
                weight_loader = getattr(self.fc.weight, "weight_loader",
                                        default_weight_loader)
                weight_loader(self.fc.weight, loaded_weight)
            ...

I think you only consider the name varieble startswith 'fc.' can only be 'fc.weight', but the fc layer of eagle-qwen2 has bias attribute, which means the name varieble can be 'fc.bias'.

Moreover, the qkv_proj layer of EAGLE-Qwen2-7B-Instruct also have bias.

I hope you can fix this in the upcoming upgrade!

Sep 25 '24 10:09 crownz248

I think this issue has been fixed in the release v0.6.2 of vllm now. Please see this: https://github.com/vllm-project/vllm/pull/8790.

Oct 11 '24 04:10 MMuzzammil1