Paddle icon indicating copy to clipboard operation
Paddle copied to clipboard

【CPU环境 PaddleX命令行快速推理体验】 报错Segmentation fault

Open Cipheeer opened this issue 4 months ago • 20 comments

bug描述 Describe the Bug

环境 Atlas 200I DK A2 Ubuntu 22.04.4 LTS aarch64 paddlepaddle_cpu=3.0.0 paddlex=3.1.3 MemTotal: 3598100 kB

运行指令(使用图片为288*170,7KB) paddlex --pipeline semantic_segmentation --input /home/HwHiAiUser/Doctor3/2.jpg --device cpu --save_path /home/HwHiAiUser/Doctor3/output/ Creating model: ('PP-LiteSeg-T', None) Using official model (PP-LiteSeg-T), the model files will be automatically downloaded and saved in /home/HwHiAiUser/.paddlex/official_models. /usr/local/miniconda3/envs/pyqt/lib/python3.8/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message)

报错如下

C++ Traceback (most recent call last):

No stack trace in paddle, may be caused by external reasons.


Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: *** Aborted at 1754792800 (unix time) try "date -d @1754792800" if you are using GNU date ***] [SignalInfo: *** SIGSEGV (@0x0) received by PID 81734 (TID 0xe7ff8e0fe120) from PID 0 ***]

Segmentation fault (core dumped)

其他补充信息 Additional Supplementary Information

No response

Cipheeer avatar Aug 10 '25 02:08 Cipheeer

建议可以执行一下 paddle.utils.run_check() 确认Paddle框架是否正常工作

Bobholamovic avatar Aug 11 '25 02:08 Bobholamovic

建议可以执行一下 paddle.utils.run_check() 确认Paddle框架是否正常工作 麻烦看一下 Image

Cipheeer avatar Oct 09 '25 11:10 Cipheeer

比较可能和开发机的CPU型号有关。请问更换其他的模型,也是类似的结果吗?

Bobholamovic avatar Oct 09 '25 11:10 Bobholamovic

比较可能和开发机的CPU型号有关。请问更换其他的模型,也是类似的结果吗? 目标检测的模型可以,我待会换我的分类模型试试

Image

Cipheeer avatar Oct 09 '25 11:10 Cipheeer

分类快速体验指令能换模型吗

比较可能和开发机的CPU型号有关。请问更换其他的模型,也是类似的结果吗?

Cipheeer avatar Oct 09 '25 11:10 Cipheeer

可以通过paddlex --get-pipeline-config image_classification获取配置文件,修改配置文件中的model_name,然后通过--pipeline指定配置文件路径(例如paddlex --pipeline image_classification.yaml)。

Bobholamovic avatar Oct 09 '25 12:10 Bobholamovic

在macbook M4 pro的Debian 12 VM里面,可以复现出来。内核:6.1.0-39

(venv) root@llm-test:~# python
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
/root/venv/lib/python3.11/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
I1013 03:19:34.119712 46903 pir_interpreter.cc:1541] New Executor is Running ...
I1013 03:19:34.120015 46903 pir_interpreter.cc:1564] pir interpreter is running by multi-thread mode ...
PaddlePaddle works well on 1 CPU.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
>>>

kevinzs2048 avatar Oct 13 '25 10:10 kevinzs2048

可以通过paddlex --get-pipeline-config image_classification获取配置文件,修改配置文件中的model_name,然后通过--pipeline指定配置文件路径(例如paddlex --pipeline image_classification.yaml)。

忘记回复,确实与CPU有关系,已经完成自训练模型的推理

Cipheeer avatar Oct 14 '25 08:10 Cipheeer

请教下,如何处理的,我在麒麟操作系统海思cpu下运行paddleocr也出现了这个问题

C++ Traceback (most recent call last):

0 paddle::AnalysisPredictor::ZeroCopyRun(bool) 1 paddle::framework::NaiveExecutor::RunInterpreterCore(std::vector<std::string, std::allocator<std::string > > const&, bool, bool) 2 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool) 3 paddle::framework::PirInterpreter::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool) 4 paddle::framework::PirInterpreter::TraceRunImpl() 5 paddle::framework::PirInterpreter::TraceRunInstructionList(std::vector<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase >, std::allocator<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase > > > const&) 6 paddle::framework::PirInterpreter::RunInstructionBase(paddle::framework::InstructionBase*) 7 paddle::framework::PhiKernelInstruction::Run() 8 phi::KernelImpl<void ()(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, phi::DenseTensor), &(void phi::ConvKernel<float, phi::CPUContext>(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, phi::DenseTensor*))>::Compute(phi::KernelContext*) 9 void phi::ConvKernelImpl<float, phi::CPUContext>(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, int, std::vector<int, std::allocator > const&, std::string const&, phi::DenseTensor*) 10 phi::funcs::Im2ColFunctor<(phi::funcs::ColFormat)0, phi::CPUContext, float>::operator()(phi::CPUContext const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, phi::DenseTensor*, common::DataLayout)


Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: *** Aborted at 1761099079 (unix time) try "date -d @1761099079" if you are using GNU date ***] [SignalInfo: *** SIGSEGV (@0xfff0e3321260) received by PID 1 (TID 0xfffbf8596ce0) from PID 18446744073226293856 ***]

Itxiaopao avatar Oct 22 '25 02:10 Itxiaopao

请教下,如何处理的,我在麒麟操作系统海思cpu下运行paddleocr也出现了这个问题

C++ Traceback (most recent call last):

0 paddle::AnalysisPredictor::ZeroCopyRun(bool) 1 paddle::framework::NaiveExecutor::RunInterpreterCore(std::vector<std::string, std::allocator<std::string > > const&, bool, bool) 2 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool) 3 paddle::framework::PirInterpreter::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool) 4 paddle::framework::PirInterpreter::TraceRunImpl() 5 paddle::framework::PirInterpreter::TraceRunInstructionList(std::vector<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase >, std::allocator<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase > > > const&) 6 paddle::framework::PirInterpreter::RunInstructionBase(paddle::framework::InstructionBase*) 7 paddle::framework::PhiKernelInstruction::Run() 8 phi::KernelImpl<void ()(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, phi::DenseTensor), &(void phi::ConvKernel<float, phi::CPUContext>(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, phi::DenseTensor*))>::Compute(phi::KernelContext*) 9 void phi::ConvKernelImpl<float, phi::CPUContext>(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, int, std::vector<int, std::allocator > const&, std::string const&, phi::DenseTensor*) 10 phi::funcs::Im2ColFunctor<(phi::funcs::ColFormat)0, phi::CPUContext, float>::operator()(phi::CPUContext const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, phi::DenseTensor*, common::DataLayout)

Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: *** Aborted at 1761099079 (unix time) try "date -d @1761099079" if you are using GNU date ***] [SignalInfo: *** SIGSEGV (@0xfff0e3321260) received by PID 1 (TID 0xfffbf8596ce0) from PID 18446744073226293856 ***]

这是飞桨框架的一个bug引起的,在框架的下一个版本将会得到修复,敬请期待!

Bobholamovic avatar Oct 23 '25 14:10 Bobholamovic

这个问题解决了吗?我在 paddlepaddle==3.2.2 版本中仍然看到这个错误。

Sandy3094 avatar Dec 03 '25 08:12 Sandy3094

这个问题解决了吗?我在 paddlepaddle==3.2.2 版本中仍然看到这个错误。

具体是什么硬件环境、哪条产线呀?

Bobholamovic avatar Dec 03 '25 11:12 Bobholamovic

这个问题解决了吗?我在 paddlepaddle==3.2.2 版本中仍然看到这个错误。

具体是什么硬件环境、哪条产线呀?

Linux=RHEL 10 aarch64 paddleocr=3.2.0 paddlepaddle=3.2.2 paddlex=3.2.1

Sandy3094 avatar Dec 03 '25 14:12 Sandy3094

这个问题解决了吗?我在 paddlepaddle==3.2.2 版本中仍然看到这个错误。

具体是什么硬件环境、哪条产线呀?

Linux=RHEL 10 aarch64 paddleocr=3.2.0 paddlepaddle=3.2.2 paddlex=3.2.1

建议也尝试升级paddlex和paddleocr到最新版本:

python -m pip install -U paddlex paddleocr

Bobholamovic avatar Dec 04 '25 02:12 Bobholamovic

这个问题解决了吗?我在 paddlepaddle==3.2.2 版本中仍然看到这个错误。

具体是什么硬件环境、哪条产线呀?

Linux=RHEL 10 aarch64 paddleocr=3.2.0 paddlepaddle=3.2.2 paddlex=3.2.1

建议也尝试升级paddlex和paddleocr到最新版本:

python -m pip install -U paddlex paddleocr

我尝试了最新版本的 paddle、paddlex 和 paddlepaddle,但仍然遇到同样的问题。 paddleocr=3.3.2 paddlepaddle=3.2.2 paddlex=3.3.10

Sandy3094 avatar Dec 04 '25 06:12 Sandy3094

这个问题解决了吗?我在 paddlepaddle==3.2.2 版本中仍然看到这个错误。

具体是什么硬件环境、哪条产线呀?

Linux=RHEL 10 aarch64 paddleocr=3.2.0 paddlepaddle=3.2.2 paddlex=3.2.1

建议也尝试升级paddlex和paddleocr到最新版本: python -m pip install -U paddlex paddleocr

我尝试了最新版本的 paddle、paddlex 和 paddlepaddle,但仍然遇到同样的问题。 paddleocr=3.3.2 paddlepaddle=3.2.2 paddlex=3.3.10

方便提供下复现的指令/脚本和数据吗?

Bobholamovic avatar Dec 04 '25 08:12 Bobholamovic

这个问题解决了吗?我在 paddlepaddle==3.2.2 版本中仍然看到这个错误。

具体是什么硬件环境、哪条产线呀?

Linux=RHEL 10 aarch64 paddleocr=3.2.0 paddlepaddle=3.2.2 paddlex=3.2.1

建议也尝试升级paddlex和paddleocr到最新版本: python -m pip install -U paddlex paddleocr

我尝试了最新版本的 paddle、paddlex 和 paddlepaddle,但仍然遇到同样的问题。 paddleocr=3.3.2 paddlepaddle=3.2.2 paddlex=3.3.10

方便提供下复现的指令/脚本和数据吗?

paddleocr ocr -i image_106.jpg --text_detection_model_name PP-OCRv5_mobile_det --text_recognition_model_name PP-OCRv5_mobile_rec --device cpu

Image

Image

Sandy3094 avatar Dec 04 '25 11:12 Sandy3094

@Sandy3094 请问具体是什么型号的CPU呢?

Bobholamovic avatar Dec 05 '25 11:12 Bobholamovic

@Sandy3094 请问具体是什么型号的CPU呢?

AWS graviton (Neoverse-N1) processor

Sandy3094 avatar Dec 05 '25 15:12 Sandy3094

@Sandy3094 请问具体是什么型号的CPU呢?

这不是操作系统特有的问题,而是处理器特有的问题。我在AWS Graviton(Neoverse-N1)处理器上使用Ubuntu 20.04时也遇到了同样的问题。

Sandy3094 avatar Dec 08 '25 10:12 Sandy3094