PaddleX icon indicating copy to clipboard operation
PaddleX copied to clipboard

昇腾910B3 OCR模型推理失败

Open wangwenqi567 opened this issue 2 months ago • 4 comments

Checklist:

描述问题

在昇腾910B3上启动小模型服务成功,但是真正推理时报错ACL error,500001 想请教是什么原因

复现

  1. 服务化部署

按照文档https://github.com/PaddlePaddle/PaddleX/blob/release/3.3/docs/other_devices_support/paddlepaddle_install_NPU.md部署 验证python -c "import paddle; paddle.utils.run_check()"时 报错

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/paddle/utils/install_check.py", line 274, in run_check
    _run_dygraph_single(use_cuda, use_xpu, use_custom, custom_device_name)
  File "/usr/local/lib/python3.10/dist-packages/paddle/utils/install_check.py", line 120, in _run_dygraph_single
    opt.step()
  File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 380, in __impl__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/lib/python3.10/dist-packages/paddle/base/wrapped_decorator.py", line 40, in __impl__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/base/framework.py", line 736, in __impl__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/optimizer/adam.py", line 526, in step
    optimize_ops = self._apply_optimize(
  File "/usr/local/lib/python3.10/dist-packages/paddle/optimizer/optimizer.py", line 1701, in _apply_optimize
    optimize_ops = self._create_optimization_pass(
  File "/usr/local/lib/python3.10/dist-packages/paddle/optimizer/optimizer.py", line 1368, in _create_optimization_pass
    self._append_optimize_op(
  File "/usr/local/lib/python3.10/dist-packages/paddle/optimizer/adam.py", line 387, in _append_optimize_op
    _, _, _, _, _, _, _ = _C_ops.adam_(
OSError: (External)  ACL error, the error code is : 500001.  (at /paddle/backends/npu/kernels/funcs/npu_op_runner.cc:445)

环境

  1. 请提供您使用的PaddlePaddle、PaddleX版本号、Python版本号 paddlepaddle 3.3.0 dev paddlepaddle-3.3.0.dev20251105-cp313-cp313-linux_aarch64.whl paddle_custom_npu-3.0.0.dev20251105-cp310-cp310-linux_aarch64.whl或 paddle_custom_npu-3.0.0.dev20251023-cp310-cp310-linux_aarch64.whl都会报错 paddlex 3.3.5 python 3.10

  2. 请提供您使用的操作系统信息,如Linux/Windows/MacOS linux、 910B3、npu-smi 24.1.0.3

wangwenqi567 avatar Nov 10 '25 06:11 wangwenqi567

可以安装这个试试:https://paddle-whl.bj.bcebos.com/nightly/npu/paddle-custom-npu/paddle_custom_npu-3.0.0.dev20250519-cp310-cp310-linux_aarch64.whl

Bobholamovic avatar Nov 10 '25 12:11 Bobholamovic

请问模型推理时,生成许多Om 文件 CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9809848188213189544.om' -rw------- 1 root root 8.6M Nov 20 20:24 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9825005608640160976.om' -rw------- 1 root root 8.6M Nov 20 20:45 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9856645976345894519.om' -rw------- 1 root root 8.6M Nov 20 21:44 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9858993478867473151.om' -rw------- 1 root root 8.6M Nov 20 21:14 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9963053396116706483.om' -rw------- 1 root root 8.6M Nov 20 21:58 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9971686263702084885.om' -rw------- 1 root root 8.6M Nov 20 20:38 'CANNExecutionProvider_Model from PaddlePaddle._18083985567429771914_0_0_9977764612297234208.om' -rw------- 1 root root 2.1M Nov 20 20:39 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_10048723242475067332.om -rw------- 1 root root 2.1M Nov 20 20:22 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_11040221541448948256.om -rw------- 1 root root 2.1M Nov 20 20:30 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_12024396286556765265.om -rw------- 1 root root 2.1M Nov 20 20:25 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_15055864649358242939.om -rw------- 1 root root 2.1M Nov 20 20:32 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_16369408534086580801.om -rw------- 1 root root 2.1M Nov 20 20:22 CANNExecutionProvider_paddle-onnx_4942186001418872651_0_0_6994647576472354924.om 显存一直上涨,请问有遇到过这种情况吗

wangwenqi567 avatar Nov 20 '25 14:11 wangwenqi567

显存上涨可能是由于缓存机制。请问上涨到一定程度后会停止上涨吗?

Bobholamovic avatar Nov 21 '25 02:11 Bobholamovic

不会,会到显存最大值,可能是onnxruntime-cann的原因,还在排查中

wangwenqi567 avatar Nov 21 '25 06:11 wangwenqi567