lmdeploy [Bug] lmdeploy awq量化后不能多卡部署

trafficstars

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.

Describe the bug

lmdeploy awq量化后不能多卡部署报错：assert tensor.shape[split_dim] % tp == 0

Reproduction

awq量化： lmdeploy lite auto_awq
model_path
--calib-dataset 'c4'
--calib-samples 128
--work-dir xxx
使用上述命令进行awq量化后，使用下面的命令进行多卡部署 CUDA_VISIBLE_DEVICES=6,7 lmdeploy serve api_server xxx --server-name 0.0.0.0 --server-port 8006 --tp 2 报错：assert tensor.shape[split_dim] % tp == 0

Environment

lmdeploy==0.4.1

Error traceback

Traceback (most recent call last):
  File "/opt/conda/bin/lmdeploy", line 8, in <module>
    sys.exit(run())
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/cli/entrypoint.py", line 37, in run
    args.run(args)
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/cli/serve.py", line 283, in api_server
    run_api_server(args.model_path,
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/serve/openai/api_server.py", line 1191, in serve
    VariableInterface.async_engine = pipeline_class(
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 206, in __init__
    self._build_turbomind(model_path=model_path,
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/serve/async_engine.py", line 254, in _build_turbomind
    self.engine = tm.TurboMind.from_pretrained(
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 396, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 170, in __init__
    self.model_comm = self._from_hf(model_source=model_source,
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/turbomind/turbomind.py", line 305, in _from_hf
    output_model.export()
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 273, in export
    self.export_transformer_block(bin, i)
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/turbomind/deploy/target_model/w4.py", line 156, in export_transformer_block
    self.save_split(w2_sz, f'layers.{i}.feed_forward.w2.scales_zeros', 0)
  File "/opt/conda/lib/python3.8/site-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 246, in save_split
    assert tensor.shape[split_dim] % tp == 0
AssertionError

Jul 04 '24 12:07 qiuxuezhe123

ref https://github.com/InternLM/lmdeploy/blob/3cec493ab52088b39c5877f734149ee7a8761f73/lmdeploy/turbomind/deploy/target_model/base.py#L246-L259

As shown in the code, it needs to be divisible by tp.

Jul 04 '24 16:07 zhyncs

ref

https://github.com/InternLM/lmdeploy/blob/3cec493ab52088b39c5877f734149ee7a8761f73/lmdeploy/turbomind/deploy/target_model/base.py#L246-L259

As shown in the code, it needs to be divisible by tp.

在量化前能被tp整除，那为啥经过量化后就不能被tp整除了呢？如何解决量化后不能被tp整除的问题呢？

Jul 05 '24 08:07 qiuxuezhe123

Which model are you using?

Jul 06 '24 08:07 zhyncs

ref https://github.com/InternLM/lmdeploy/blob/3cec493ab52088b39c5877f734149ee7a8761f73/lmdeploy/turbomind/deploy/target_model/base.py#L246-L259

As shown in the code, it needs to be divisible by tp.

在量化前能被tp整除，那为啥经过量化后就不能被tp整除了呢？如何解决量化后不能被tp整除的问题呢？

不是说权重不能被tp整除，而是量化的zeros不能被tp整除了。因为量化是按照group_size=128分组量化的。所以zeros的shape相比较权重的shape已经除过128了。它有可能不能继续被 tp 整除了。

Jul 09 '24 14:07 lvhan028

Which model are you using?

qwen2-7b-chat

Jul 11 '24 12:07 qiuxuezhe123

ref https://github.com/InternLM/lmdeploy/blob/3cec493ab52088b39c5877f734149ee7a8761f73/lmdeploy/turbomind/deploy/target_model/base.py#L246-L259

As shown in the code, it needs to be divisible by tp.

在量化前能被tp整除，那为啥经过量化后就不能被tp整除了呢？如何解决量化后不能被tp整除的问题呢？

不是说权重不能被tp整除，而是量化的zeros不能被tp整除了。因为量化是按照group_size=128分组量化的。所以zeros的shape相比较权重的shape已经除过128了。它有可能不能继续被 tp 整除了。

请问这个如何解决呢？

Jul 11 '24 12:07 qiuxuezhe123

我也遇到这个问题，我是直接用千问提供的qwen1.5-14-chat-awq模型，想请问一下能不能通过牺牲一些精度，实现至少双T4卡部署，谢谢

Jul 23 '24 08:07 WCwalker

这个问题会在 #2090 解决

Jul 25 '24 04:07 lzhangzz

lmdeploy lmdeploy copied to clipboard

[Bug] lmdeploy awq量化后不能多卡部署

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lmdeploy
lmdeploy copied to clipboard