ms-swift
ms-swift copied to clipboard
qwen1half-moe-2.7B-chat采用gptq量化后infer报错
我对qwen1half-moe-2.7B-chat使用常规lora微调后尝试了gptq的4bit量化,但在重新推理时出现了: [rank0]: File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in getattr [rank0]: raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") [rank0]: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'. Did you mean: 'qweight'?
环境:Python 3.10(ubuntu22.04) Cuda 12.1
推理命令
CUDA_VISIBLE_DEVICES=0 swift infer --model_type wen1half-moe-a2_7b-chat-int4 --ckpt_dir ./checkpoint-395-merged-gptq-int4
报错信息:
INFO 05-13 21:18:53 utils.py:660] Found nccl from library /root/.config/vllm/nccl/cu12/libnccl.so.2.18.1
INFO 05-13 21:18:53 selector.py:27] Using FlashAttention-2 backend.
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/swift/cli/infer.py", line 5, in <module>
[rank0]: infer_main()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/swift/utils/run_utils.py", line 27, in x_main
[rank0]: result = llm_x(args, **kwargs)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/swift/llm/infer.py", line 228, in llm_infer
[rank0]: llm_engine, template = prepare_vllm_engine_template(args)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/swift/llm/utils/vllm_utils.py", line 375, in prepare_vllm_engine_template
[rank0]: llm_engine = get_vllm_engine(
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/swift/llm/utils/vllm_utils.py", line 91, in get_vllm_engine
[rank0]: llm_engine = llm_engine_cls.from_engine_args(engine_args)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 292, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 160, in __init__
[rank0]: self.model_executor = executor_class(
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in __init__
[rank0]: self._init_executor()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 23, in _init_executor
[rank0]: self._init_non_spec_worker()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 69, in _init_non_spec_worker
[rank0]: self.driver_worker.load_model()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/worker/worker.py", line 118, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 164, in load_model
[rank0]: self.model = get_model(
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/model_loader/__init__.py", line 19, in get_model
[rank0]: return loader.load_model(model_config=model_config,
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 222, in load_model
[rank0]: model = _initialize_model(model_config, self.load_config,
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 88, in _initialize_model
[rank0]: return model_class(config=model_config.hf_config,
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 377, in __init__
[rank0]: self.model = Qwen2MoeModel(config, quant_config)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 341, in __init__
[rank0]: self.layers = nn.ModuleList([
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 342, in <listcomp>
[rank0]: Qwen2MoeDecoderLayer(config, layer_idx, quant_config=quant_config)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 283, in __init__
[rank0]: self.mlp = Qwen2MoeSparseMoeBlock(config=config,
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 113, in __init__
[rank0]: self.pack_params()
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 137, in pack_params
[rank0]: w1.append(expert.gate_up_proj.weight)
[rank0]: File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
[rank0]: raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
[rank0]: AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'. Did you mean: 'qweight'?
我尝试使用了pt作为推理后端,但是在模型加载后,也无法正常的进行推理: 推理命令
swift infer --model_type wen1half-moe-a2_7b-chat-int4 --ckpt_dir ./checkpoint-395-merged-gptq-int4/ --infer_backend pt
报错信息
(up_proj): QuantLinear()
)
(gate): QuantLinear()
(shared_expert_gate): QuantLinear()
)
(input_layernorm): Qwen2MoeRMSNorm()
(post_attention_layernorm): Qwen2MoeRMSNorm()
)
)
(norm): Qwen2MoeRMSNorm()
)
(lm_head): Linear(in_features=2048, out_features=151936, bias=False)
)
[INFO:swift] Qwen2MoeForCausalLM: 622.4302M Params (622.4302M Trainable [100.0000%]), 2049.3039M Buffers.
[INFO:swift] system: You are a helpful assistant.
[INFO:swift] Input `exit` or `quit` to exit the conversation.
[INFO:swift] Input `multi-line` to switch to multi-line input mode.
[INFO:swift] Input `reset-system` to reset the system and clear the history.
[INFO:swift] Input `clear` to clear the history.
<<< 你好
Exception in thread Thread-1 (generate):
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/root/miniconda3/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/generation/utils.py", line 1622, in generate
result = self._sample(
File "/root/miniconda3/lib/python3.10/site-packages/transformers/generation/utils.py", line 2791, in _sample
outputs = self(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py", line 1350, in forward
outputs = self.model(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py", line 1219, in forward
layer_outputs = decoder_layer(
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py", line 929, in forward
hidden_states = self.mlp(hidden_states)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/qwen2_moe/modeling_qwen2_moe.py", line 821, in forward
router_logits = self.gate(hidden_states)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py", line 348, in forward
weight = scales * (weight - zeros)
RuntimeError: The size of tensor a (60) must match the size of tensor b (32) at non-singleton dimension 2
pip list
Package Version
--------------------------------- ---------------
absl-py 2.0.0
accelerate 0.27.2
addict 2.4.0
aiofiles 23.2.1
aiohttp 3.9.5
aiosignal 1.3.1
alibabacloud-credentials 0.3.2
alibabacloud-endpoint-util 0.0.3
alibabacloud-gateway-spi 0.0.1
alibabacloud-openapi-util 0.2.2
alibabacloud-sts20150401 1.1.4
alibabacloud-tea 0.3.6
alibabacloud-tea-openapi 0.3.8
alibabacloud-tea-util 0.3.12
alibabacloud-tea-xml 0.0.2
alipai 0.4.7
aliyun-python-sdk-core 2.15.1
aliyun-python-sdk-kms 2.16.2
altair 5.3.0
annotated-types 0.6.0
anyio 4.2.0
appdirs 1.4.4
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
asttokens 2.4.1
async-lru 2.0.4
async-timeout 4.0.3
attrs 23.2.0
auto_gptq 0.7.1
Babel 2.14.0
backoff 1.11.1
beautifulsoup4 4.12.2
bitsandbytes 0.42.0
bleach 6.1.0
blinker 1.8.1
brotlipy 0.7.0
cachetools 5.3.2
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 2.0.4
click 8.1.7
cloudpickle 3.0.0
cmake 3.29.2
colorama 0.4.6
coloredlogs 15.0.1
comm 0.2.1
conda 22.11.1
conda-content-trust 0.1.3
conda-package-handling 1.9.0
contourpy 1.2.0
cpm-kernels 1.0.11
crcmod 1.7
cryptography 42.0.7
cycler 0.12.1
dacite 1.8.1
datasets 2.18.0
debugpy 1.8.0
decorator 5.1.1
decord 0.6.0
deepspeed 0.14.0
defusedxml 0.7.1
Deprecated 1.2.14
diffusers 0.25.0
dill 0.3.8
diskcache 5.6.3
docker-pycreds 0.4.0
docstring_parser 0.16
eas-prediction 0.24
editdistance 0.8.1
einops 0.8.0
evaluate 0.4.1
exceptiongroup 1.2.0
executing 2.0.1
fastapi 0.110.2
fastjsonschema 2.19.1
ffmpy 0.3.2
filelock 3.13.1
flash-attn 2.5.8
fonttools 4.47.0
fqdn 1.5.1
frozenlist 1.4.1
fsspec 2023.12.2
gast 0.5.4
gekko 1.0.6
gitdb 4.0.11
GitPython 3.1.43
google-auth 2.26.1
google-auth-oauthlib 1.2.0
gradio 4.28.3
gradio_client 0.16.0
grpcio 1.60.0
h11 0.14.0
hjson 3.1.0
httpcore 1.0.5
httptools 0.6.1
httpx 0.27.0
huggingface-hub 0.23.0
humanfriendly 10.0
idna 3.4
importlib_metadata 7.1.0
importlib_resources 6.4.0
iniconfig 2.0.0
interegular 0.3.3
ipykernel 6.28.0
ipython 8.20.0
ipywidgets 8.1.1
isoduration 20.11.0
jedi 0.19.1
jieba 0.42.1
Jinja2 3.1.2
jmespath 0.10.0
joblib 1.4.0
json5 0.9.14
jsonlines 4.0.0
jsonpointer 2.4
jsonschema 4.20.0
jsonschema-specifications 2023.12.1
jupyter_client 8.6.0
jupyter_core 5.7.1
jupyter-events 0.9.0
jupyter-lsp 2.2.1
jupyter_server 2.12.2
jupyter_server_terminals 0.5.1
jupyterlab 4.0.10
jupyterlab-language-pack-zh-CN 4.0.post6
jupyterlab_pygments 0.3.0
jupyterlab_server 2.25.2
jupyterlab-widgets 3.0.9
kiwisolver 1.4.5
lark 1.1.9
llmuses 0.3.0
llvmlite 0.42.0
lm-format-enforcer 0.9.8
lxml 5.2.1
Markdown 3.5.1
markdown-it-py 3.0.0
MarkupSafe 2.1.3
marshmallow 3.21.2
marshmallow-oneofschema 3.1.1
matplotlib 3.8.2
matplotlib-inline 0.1.6
mdurl 0.1.2
mistune 3.0.2
modelscope 1.14.0
mpmath 1.3.0
ms-swift 2.0.4
msgpack 1.0.8
multidict 6.0.5
multiprocess 0.70.16
nbclient 0.9.0
nbconvert 7.14.0
nbformat 5.9.2
nest-asyncio 1.5.8
networkx 3.2.1
ninja 1.11.1.1
nltk 3.8.1
notebook_shim 0.2.3
numba 0.59.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-ml-py 12.550.52
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
oauthlib 3.2.2
openai 0.28.0
optimum 1.19.1
orjson 3.10.1
oss2 2.18.5
outlines 0.0.34
overrides 7.4.0
packaging 23.2
pandas 2.2.2
pandocfilters 1.5.0
parso 0.8.3
peft 0.10.0
pexpect 4.9.0
pillow 10.2.0
pip 22.3.1
platformdirs 4.1.0
plotly 5.22.0
pluggy 1.5.0
ply 3.11
portalocker 2.8.2
prometheus-client 0.19.0
prometheus-fastapi-instrumentator 7.0.0
prompt-toolkit 3.0.43
protobuf 3.20.3
psutil 5.9.7
ptyprocess 0.7.0
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 16.0.0
pyarrow-hotfix 0.6
pyasn1 0.5.1
pyasn1-modules 0.3.0
pycosat 0.6.4
pycparser 2.21
pycryptodome 3.20.0
pydantic 2.7.1
pydantic_core 2.18.2
pydeck 0.9.0
pydub 0.25.1
Pygments 2.17.2
Pympler 1.0.1
pynvml 11.5.0
pyodps 0.11.6.1
pyOpenSSL 24.1.0
pyparsing 3.1.1
PySocks 1.7.1
pytest 8.2.0
python-dateutil 2.8.2
python-dotenv 1.0.1
python-json-logger 2.0.7
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.1
pyzmq 25.1.2
ray 2.20.0
referencing 0.32.1
regex 2024.4.28
requests 2.31.0
requests-oauthlib 1.3.1
requests-toolbelt 1.0.0
responses 0.18.0
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.7.1
rouge 1.0.1
rouge-chinese 1.0.3
rouge-score 0.1.2
rpds-py 0.16.2
rsa 4.9
ruamel.yaml 0.17.21
ruamel.yaml.clib 0.2.6
ruff 0.4.2
sacrebleu 2.4.2
safetensors 0.4.3
scikit-learn 1.4.2
scipy 1.13.0
seaborn 0.13.2
semantic-version 2.10.0
Send2Trash 1.8.2
sentencepiece 0.2.0
sentry-sdk 2.0.1
setproctitle 1.3.3
setuptools 65.5.0
shellingham 1.5.4
shtab 1.7.1
simple-ddl-parser 1.1.0
simplejson 3.19.2
six 1.16.0
smmap 5.0.1
sniffio 1.3.0
sortedcontainers 2.4.0
soupsieve 2.5
stack-data 0.6.3
starlette 0.37.2
streamlit 1.34.0
supervisor 4.2.5
sympy 1.12
tabulate 0.9.0
tenacity 8.2.3
tensorboard 2.15.1
tensorboard-data-server 0.7.2
terminado 0.18.0
threadpoolctl 3.5.0
tiktoken 0.6.0
tinycss2 1.2.1
tokenizers 0.19.1
toml 0.10.2
tomli 2.0.1
tomlkit 0.12.0
toolz 0.12.0
torch 2.3.0
torchaudio 2.1.2
torchdata 0.7.1
torchvision 0.16.2+cu121
tornado 6.4
tqdm 4.64.1
traitlets 5.14.1
transformers 4.40.1
transformers-stream-generator 0.0.5
triton 2.3.0
trl 0.8.6
typer 0.12.3
types-python-dateutil 2.8.19.20240106
typing_extensions 4.9.0
tyro 0.8.3
tzdata 2024.1
uri-template 1.3.0
urllib3 2.2.1
uvicorn 0.29.0
uvloop 0.19.0
vllm 0.4.2
vllm-nccl-cu12 2.18.1.0.4.0
wandb 0.16.6
watchdog 4.0.0
watchfiles 0.21.0
wcwidth 0.2.13
webcolors 1.13
webencodings 0.5.1
websocket-client 1.7.0
websockets 11.0.3
Werkzeug 3.0.1
wheel 0.37.1
widgetsnbextension 4.0.9
wrapt 1.16.0
xformers 0.0.26.post1
xxhash 3.4.1
yapf 0.40.2
yarl 1.9.4
zipp 3.18.1