lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

[Bug] V100使用turbomind推理AWQ的Qwen2-72b-Instruct会出现奇怪的推理结果

Open lljzhgxd opened this issue 1 year ago • 3 comments
trafficstars

Checklist

  • [X] 1. I have searched related issues but cannot get the expected help.
  • [X] 2. The bug has not been fixed in the latest version.
  • [X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

在V100服务器上部署lmdeploy推理AWQ的Qwen2-72b-Instruct会出现莫名其妙的推理结果。但是完全一样的环境和命令,部署在A100服务器上就可以推理出正确的结果。是不是lmdeploy的AWQ推理不支持V100?

Reproduction

命令: CUDA_VISIBLE_DEVICES=4,5 lmdeploy serve api_server /mnt/data1/models/Qwen2-72B-Instruct-AWQ --server-port 8012 --tp 2 --backend turbomind --model-format awq --model-name Qwen2-72B-Instruct 请求: { "model": "Qwen2-72B-Instruct", "messages": [ { "role": "user", "content": "hello" } ], "max_tokens": 128, "stream": false } 结果: { "id": "1", "object": "chat.completion", "created": 1724039840, "model": "Qwen2-72B-Instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "周恩 prostitu prostitu言えば_readablemue prostitu prostitu prostitu prostitu prostituouflageuhe prostitu言えば社会责任 prostitu prostitu gợiropped党总 prostitu prostitu prostitu prostitu言えばarella炝 prostitu prostitu prostitualin委组织部TableViewCellTanggal prostitu prostitu prostitu prostitu prostitu言えば_readable诌Tanggal prostituTambahTanggal[ETanggal prostitu prostitu prostitu prostitu prostitu言えば_readable prostitu prostitu言えばŨ邓小 prostitu gợi言えば无声煸虓alin委组织部TableViewCellTanggal gợi凹_readableTanggal prostitu gợiTanggal凹_readableroppedeworthy prostitu prostitu prostitu prostitu relafa走得 prostitu prostitu prostitu prostitu prostitu prostitu prostitu prostitu prostitu gợiTanggal prostitu prostitu娼Tanggal prostitu prostitu prostitu prostitu言えば_readable prostitu prostitu prostitu prostitu prostitu prostitu言えば.ws诌Tanggalalin版权归言えば_readableThêmTanggal言えば.enterprise++){", "tool_calls": null }, "logprobs": null, "finish_reason": "length" } ], "usage": { "prompt_tokens": 20, "total_tokens": 149, "completion_tokens": 129 } }

Environment

V100 32G SXM2 8卡
Driver Version: 555.42.02
CUDA Version: 12.5
nvcc:Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0

Error traceback

No response

lljzhgxd avatar Aug 19 '24 06:08 lljzhgxd

V100 AWQ/GPTQ 刚在 #2090 支持,还没发版

lzhangzz avatar Aug 19 '24 06:08 lzhangzz

请问大概什么时候会发版?我们这很需要。谢谢

lljzhgxd avatar Aug 19 '24 07:08 lljzhgxd

可以先试试 nightly build

https://github.com/zhyncs/lmdeploy-build/releases/tag/b28a1d0

lzhangzz avatar Aug 19 '24 07:08 lzhangzz

Try the latest https://github.com/InternLM/lmdeploy/releases/tag/v0.6.0a0

zhyncs avatar Aug 30 '24 08:08 zhyncs