ChatGLM2-6B icon indicating copy to clipboard operation
ChatGLM2-6B copied to clipboard

[BUG/Help] 用vllm 起INT4量化版本的模型报错 类型不匹配 self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096]

Open yjjiang11 opened this issue 8 months ago • 0 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096] which results failling to setup vllm server

Expected Behavior

chatglm2-6b-int4 can be deployed with vllm

Steps To Reproduce

none

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

s

yjjiang11 avatar Jun 05 '24 08:06 yjjiang11