昇腾910B3 OCR模型推理失败
Checklist:
- [ ] 查找历史相关issue寻求解答
- [ ] 翻阅FAQ
- [ ] 翻阅PaddleX 文档
- [ ] 确认bug是否在新版本里还未修复
描述问题
在昇腾910B3上启动小模型服务成功,但是真正推理时报错ACL error,500001 想请教是什么原因
复现
- 服务化部署
按照文档https://github.com/PaddlePaddle/PaddleX/blob/release/3.3/docs/other_devices_support/paddlepaddle_install_NPU.md部署 验证python -c "import paddle; paddle.utils.run_check()"时 报错
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/paddle/utils/install_check.py", line 274, in run_check
_run_dygraph_single(use_cuda, use_xpu, use_custom, custom_device_name)
File "/usr/local/lib/python3.10/dist-packages/paddle/utils/install_check.py", line 120, in _run_dygraph_single
opt.step()
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 380, in __impl__
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/wrapped_decorator.py", line 40, in __impl__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/base/framework.py", line 736, in __impl__
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddle/optimizer/adam.py", line 526, in step
optimize_ops = self._apply_optimize(
File "/usr/local/lib/python3.10/dist-packages/paddle/optimizer/optimizer.py", line 1701, in _apply_optimize
optimize_ops = self._create_optimization_pass(
File "/usr/local/lib/python3.10/dist-packages/paddle/optimizer/optimizer.py", line 1368, in _create_optimization_pass
self._append_optimize_op(
File "/usr/local/lib/python3.10/dist-packages/paddle/optimizer/adam.py", line 387, in _append_optimize_op
_, _, _, _, _, _, _ = _C_ops.adam_(
OSError: (External) ACL error, the error code is : 500001. (at /paddle/backends/npu/kernels/funcs/npu_op_runner.cc:445)
环境
-
请提供您使用的PaddlePaddle、PaddleX版本号、Python版本号 paddlepaddle 3.3.0 dev paddlepaddle-3.3.0.dev20251105-cp313-cp313-linux_aarch64.whl paddle_custom_npu-3.0.0.dev20251105-cp310-cp310-linux_aarch64.whl或 paddle_custom_npu-3.0.0.dev20251023-cp310-cp310-linux_aarch64.whl都会报错 paddlex 3.3.5 python 3.10
-
请提供您使用的操作系统信息,如Linux/Windows/MacOS linux、 910B3、npu-smi 24.1.0.3
可以安装这个试试:https://paddle-whl.bj.bcebos.com/nightly/npu/paddle-custom-npu/paddle_custom_npu-3.0.0.dev20250519-cp310-cp310-linux_aarch64.whl
请问模型推理时,生成许多Om 文件 CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9809848188213189544.om' -rw------- 1 root root 8.6M Nov 20 20:24 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9825005608640160976.om' -rw------- 1 root root 8.6M Nov 20 20:45 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9856645976345894519.om' -rw------- 1 root root 8.6M Nov 20 21:44 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9858993478867473151.om' -rw------- 1 root root 8.6M Nov 20 21:14 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9963053396116706483.om' -rw------- 1 root root 8.6M Nov 20 21:58 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9971686263702084885.om' -rw------- 1 root root 8.6M Nov 20 20:38 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9977764612297234208.om' -rw------- 1 root root 2.1M Nov 20 20:39 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_10048723242475067332.om -rw------- 1 root root 2.1M Nov 20 20:22 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_11040221541448948256.om -rw------- 1 root root 2.1M Nov 20 20:30 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_12024396286556765265.om -rw------- 1 root root 2.1M Nov 20 20:25 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_15055864649358242939.om -rw------- 1 root root 2.1M Nov 20 20:32 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_16369408534086580801.om -rw------- 1 root root 2.1M Nov 20 20:22 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_6994647576472354924.om 显存一直上涨,请问有遇到过这种情况吗
显存上涨可能是由于缓存机制。请问上涨到一定程度后会停止上涨吗?
不会,会到显存最大值,可能是onnxruntime-cann的原因,还在排查中