PaddleX 昇腾910使用table_recognition产线出现错误。

昇腾910使用table_recognition产线出现错误。

Open zryf2000 opened this issue 6 months ago • 2 comments

trafficstars

我使用官方ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84 docker环境，paddlex 3.0-RC1, python -c "import paddle; print(paddle.version)" 测试没有问题。通过查找发现调用模型SLANet-plus出现此问题。 root@ubuntu:/work# paddlex --pipeline layout_parsing \

    --input 0011.jpeg  \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --save_path ./output \
    --device npu:0

Creating model: ('PP-LCNet_x1_0_doc_ori', None) Using official model (PP-LCNet_x1_0_doc_ori), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. I0513 19:01:11.371599 51108 init.cc:237] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.10/dist-packages/paddle_custom_device I0513 19:01:11.371654 51108 init.cc:146] Try loading custom device libs from: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device] I0513 19:01:12.236369 51108 custom_device_load.cc:52] Succeed in loading custom runtime in lib: /usr/local/lib/python3.10/dist-packages/paddle_custom_device/libpaddle-custom-npu.so I0513 19:01:12.236433 51108 custom_device_load.cc:59] Skipped lib [/usr/local/lib/python3.10/dist-packages/paddle_custom_device/libpaddle-custom-npu.so]: no custom engine Plugin symbol in this lib. I0513 19:01:12.240267 51108 custom_kernel.cc:63] Succeed in loading 359 custom kernel(s) from loaded lib(s), will be used like native ones. I0513 19:01:12.240479 51108 init.cc:158] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device] I0513 19:01:12.240526 51108 init.cc:243] CustomDevice: npu, visible devices count: 2 Creating model: ('UVDoc', None) Using official model (UVDoc), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('RT-DETR-H_layout_17cls', None) Using official model (RT-DETR-H_layout_17cls), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('PP-OCRv4_server_det', None) Using official model (PP-OCRv4_server_det), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('PP-OCRv4_server_rec', None) Using official model (PP-OCRv4_server_rec), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('PP-OCRv4_server_seal_det', None) Using official model (PP-OCRv4_server_seal_det), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('PP-OCRv4_server_rec', None) Using official model (PP-OCRv4_server_rec), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('SLANet_plus', None) Using official model (SLANet_plus), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Traceback (most recent call last): File "/usr/local/bin/paddlex", line 8, in sys.exit(console_entry()) File "/usr/local/lib/python3.10/dist-packages/paddlex/main.py", line 26, in console_entry main() File "/usr/local/lib/python3.10/dist-packages/paddlex/paddlex_cli.py", line 467, in main return pipeline_predict( File "/usr/local/lib/python3.10/dist-packages/paddlex/paddlex_cli.py", line 331, in pipeline_predict for res in result: File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/pipelines/layout_parsing/pipeline.py", line 459, in predict for img_id, batch_data in enumerate(self.batch_sampler(input)): File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/common/batch_sampler/base_batch_sampler.py", line 80, in call yield from self.sample(input) File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/common/batch_sampler/image_batch_sampler.py", line 101, in sample file_list = self._get_files_list(file_path) File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/common/batch_sampler/image_batch_sampler.py", line 59, in _get_files_list raise Exception(f"Not found any img file in path: {fp}") Exception: Not found any img file in path: 0011.jpeg root@ubuntu:/work# root@ubuntu:/work# root@ubuntu:/work# bash: root@ubuntu:/work#: No such file or directory root@ubuntu:/work# paddlex --pipeline layout_parsing \

    --input 0011.jpg  \
    --use_doc_orientation_classify False \
    --use_doc_unwarping False \
    --use_textline_orientation False \
    --save_path ./output \
    --device npu:0

Creating model: ('PP-LCNet_x1_0_doc_ori', None) Using official model (PP-LCNet_x1_0_doc_ori), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. I0513 19:01:45.694962 57513 init.cc:237] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.10/dist-packages/paddle_custom_device I0513 19:01:45.695019 57513 init.cc:146] Try loading custom device libs from: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device] I0513 19:01:46.575884 57513 custom_device_load.cc:52] Succeed in loading custom runtime in lib: /usr/local/lib/python3.10/dist-packages/paddle_custom_device/libpaddle-custom-npu.so I0513 19:01:46.575968 57513 custom_device_load.cc:59] Skipped lib [/usr/local/lib/python3.10/dist-packages/paddle_custom_device/libpaddle-custom-npu.so]: no custom engine Plugin symbol in this lib. I0513 19:01:46.589579 57513 custom_kernel.cc:63] Succeed in loading 359 custom kernel(s) from loaded lib(s), will be used like native ones. I0513 19:01:46.589810 57513 init.cc:158] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device] I0513 19:01:46.589886 57513 init.cc:243] CustomDevice: npu, visible devices count: 2 Creating model: ('UVDoc', None) Using official model (UVDoc), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('RT-DETR-H_layout_17cls', None) Using official model (RT-DETR-H_layout_17cls), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('PP-OCRv4_server_det', None) Using official model (PP-OCRv4_server_det), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('PP-OCRv4_server_rec', None) Using official model (PP-OCRv4_server_rec), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('PP-OCRv4_server_seal_det', None) Using official model (PP-OCRv4_server_seal_det), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('PP-OCRv4_server_rec', None) Using official model (PP-OCRv4_server_rec), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. Creating model: ('SLANet_plus', None) Using official model (SLANet_plus), the model files will be automatically downloaded and saved in /root/.paddlex/official_models. .Call aclrtSynchronizeStream(reinterpret_cast<aclrtStream>(stream)) failed : 507053 at file /paddle/backends/npu/runtime/runtime.cc line 726 EZ1001: [PID: 57513] 2025-05-13-19:02:04.234.060 Shape of aclnnSliceV2 out should be [0,1,300,4], but current is [1,1,300,4]. TraceBack (most recent call last): executor is nullptr. Shape of aclnnSliceV2 out should be [0,1,300,17], but current is [1,1,300,17]. The error from device(chipId:6, dieId:0), serial number is 10, there is an aivec error exception, core id is 26, error code = 0x800000, dump info: pc start: 0x124c00000000, current: 0x124c00000bd8, vec error info: 0x13130c7994, mte error info: 0x9030000ba, ifu error info: 0x7036f3e0c7cc0, ccu error info: 0xecc8bec80d21453e, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100531c00.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1417] The extend info: errcode:(0x800000, 0, 0) errorStr: The DDR address of the MTE instruction is out of range. fixp_error0 info: 0x30000ba, fixp_error1 info: 0x9 fsmId:0, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_proc.cc][LINE:1429] Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1356] AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1124] Aicore kernel execute failed, device_id=6, stream_id=42, report_stream_id=42, task_id=38087, flip_num=0, fault kernel_name=StridedSliceV3_a3763a988b9143d31d776fc7ba893cf4_high_performance__kernel0, fault kernel info ext=StridedSliceV3_a3763a988b9143d31d776fc7ba893cf4_high_performance__kernel0, program id=0, hash=18112778502496636264.[FUNC:GetError][FILE:stream.cc][LINE:1124] [AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1124] rtStreamSynchronize execute failed, reason=[device mem error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] synchronize stream failed, runtime result = 507053[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

C++ Traceback (most recent call last):

0 paddle::AnalysisPredictor::ZeroCopyRun(bool) 1 paddle::framework::NaiveExecutor::RunInterpreterCore(std::vector<std::string, std::allocator<std::string > > const&, bool, bool) 2 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool) 3 paddle::framework::PirInterpreter::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool) 4 paddle::framework::PirInterpreter::TraceRunImpl() 5 paddle::framework::PirInterpreter::TraceRunInstructionList(std::vector<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase >, std::allocator<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase > > > const&) 6 paddle::framework::PirInterpreter::RunInstructionBase(paddle::framework::InstructionBase*) 7 paddle::framework::PhiKernelInstruction::Run() 8 paddle::dialect::SliceOp::InferMeta(phi::InferMetaContext*) 9 paddle::experimental::IntArrayBasephi::DenseTensor::IntArrayBase(phi::DenseTensor const&) 10 void phi::Copyphi::DeviceContext(phi::DeviceContext const&, phi::DenseTensor const&, phi::Place, bool, phi::DenseTensor*) 11 phi::MemoryUtils::Copy(phi::Place const&, void*, phi::Place const&, void const*, unsigned long, void*) 12 void paddle::memory::Copy<phi::Place, phi::Place>(phi::Place, void*, phi::Place, void const*, unsigned long, void*) 13 void paddle::memory::Copy<phi::CPUPlace, phi::CustomPlace>(phi::CPUPlace, void*, phi::CustomPlace, void const*, unsigned long, void*) 14 phi::CustomDevice::MemoryCopyD2H(unsigned long, void*, void const*, unsigned long, phi::stream::Stream const*) 15 phi::CustomDevice::SynchronizeStream(unsigned long, phi::stream::Stream const*) 16 SyncStream(C_Device_st*, C_Stream_st*) 17 phi::DeviceManager::~DeviceManager() 18 std::_Hashtable<std::string, std::pair<std::string const, std::vector<std::unique_ptr<phi::Device, std::default_deletephi::Device >, std::allocator<std::unique_ptr<phi::Device, std::default_deletephi::Device > > > >, std::allocator<std::pair<std::string const, std::vector<std::unique_ptr<phi::Device, std::default_deletephi::Device >, std::allocator<std::unique_ptr<phi::Device, std::default_deletephi::Device > > > > >, std::__detail::_Select1st, std::equal_to<std::string >, std::hash<std::string >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::clear() 19 phi::CustomDevice::DeInitDevice(unsigned long) 20 ReleaseDevice(C_Device_st*) 21 std::__detail::_Map_base<int, std::pair<int const, std::__cxx11::list<void*, std::allocator<void*> > >, std::allocator<std::pair<int const, std::__cxx11::list<void*, std::allocator<void*> > > >, std::__detail::_Select1st, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::operator[](int const&)

Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: *** Aborted at 1747134133 (unix time) try "date -d @1747134133" if you are using GNU date ***] [SignalInfo: *** SIGSEGV (@0xe0a9) received by PID 57513 (TID 0xffff879e5950) from PID 57513 ***]

^C[ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared! [ERROR] TBE Subprocess[task_distribute] raise error[], main process disappeared!

Checklist:

[ ] 查找历史相关issue寻求解答
[ ] 翻阅FAQ
[ ] 翻阅PaddleX 文档
[ ] 确认bug是否在新版本里还未修复

描述问题

复现

您是否已经正常运行我们提供的教程？
您是否在教程的基础上修改代码内容？还请您提供运行的代码
您使用的数据集是？
请提供您出现的报错信息及相关log

环境

请提供您使用的PaddlePaddle和PaddleX的版本号
请提供您使用的操作系统信息，如Linux/Windows/MacOS
请问您使用的Python版本是？
请问您使用的CUDA/cuDNN的版本号是？

May 13 '25 11:05 zryf2000

PaddleX PaddleX copied to clipboard

昇腾910使用table_recognition产线出现错误。

C++ Traceback (most recent call last):

Error Message Summary:

Checklist:

描述问题

复现

环境

PaddleX
PaddleX copied to clipboard