NPU qwen2模型推理报错
报错描述
使用swift infer命令,do_sample=True时报错,do_sample=False时可以推理但生成结果乱码
环境
- NPU:昇腾910B3
- Python:3.9.18
- ms-swift:2.4.0.post1
- torch-npu:2.1.0
- Transformers:4.37.2
推理模型
Qwen2-7B-Instruct
报错内容
EZ9999: Inner Error! EZ9999 Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1677] TraceBack (most recent call last): AICPU Kernel task happen error, retCode=0x2a.[FUNC:GetError][FILE:stream.cc][LINE:1454] Aicpu kernel execute failed, device_id=0, stream_id=28, task_id=1726, errorCode=2a.[FUNC:PrintAicpuErrorInfo][FILE:task_info.cc][LINE:1522] Aicpu kernel execute failed, device_id=0, stream_id=28, task_id=1726, fault op_name=[FUNC:GetError][FILE:stream.cc][LINE:1454] rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50] synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
Exception in thread Thread-6: Traceback (most recent call last): File "/mnt/dsep/python/venv/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/mnt/dsep/python/venv/lib/python3.9/threading.py", line 917, in run self._target(*self._args, **self._kwargs) File "/mnt/dsep/python/venv/lib/python3.9/site-packages/swift/llm/utils/utils.py", line 694, in _model_generate return model.generate(*args, **kwargs) File "/mnt/dsep/python/venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/mnt/dsep/python/venv/lib/python3.9/site-packages/transformers/generation/utils.py", line 1525, in generate return self.sample( File "/mnt/dsep/python/venv/lib/python3.9/site-packages/transformers/generation/utils.py", line 2669, in sample streamer.put(next_tokens.cpu()) RuntimeError: ACL stream synchronize failed, error code:507018
又测试了一下,Qwen2-Instruct系列只有0.5B模型能正常推理,其他模型都不可以,报错内容和7B模型相同。
你好,请问你的swift infer和swift deploy可以用多卡吗?我没找到设置NPU多卡的参数
又测试了一下,Qwen2-Instruct系列只有0.5B模型能正常推理,其他模型都不可以,报错内容和7B模型相同。
我测了0.5B和3B可以,7B会E39999
请使用transformers的推理代码试试可不可以。如果不行的话, ms-swift应该也是推理不了的