PaddleX icon indicating copy to clipboard operation
PaddleX copied to clipboard

PP-STRUCTRUE -V3无法本地部署

Open hellomrsu opened this issue 7 months ago • 8 comments

Checklist:

描述问题

我打算本地服务器部署PPSTRUCTRUE-V3(但没有服务化部署或高性能部署),用于将复杂的PDF表格转换成JSON文件,然后根据识别的TEXTBOX的坐标和内容提取关键信息。现在遇到无法运行PPSTRUCTRUE-V3的问题,但是能运行PP-CHATOCR-V4。

复现

1.python代码: from paddlex import create_pipeline

pipeline = create_pipeline(pipeline="/home/szp/PP-TableMagic/PP-StructureV3.yaml",device="cpu")

output = pipeline.predict( input="/home/szp/PP-TableMagic/input/1/2/TBJU7972437guolianyundan.png",device="cpu", ) for res in output: res.print() ## 打印预测的结构化输出 res.save_to_json(save_path="output") ## 保存当前图像的结构化json结果 res.save_to_markdown(save_path="output") ## 保存当前图像的markdown格式的结果 2.YAML配置文件:

pipeline_name: PP-StructureV3

use_doc_preprocessor: True use_general_ocr: True use_seal_recognition: True use_table_recognition: True use_formula_recognition: False

SubModules: LayoutDetection: module_name: layout_detection model_name: PP-DocLayout-L model_dir: /home/szp/.paddlex/official_models/PP-DocLayout-L threshold: 0: 0.3 # paragraph_title 1: 0.5 # image 2: 0.5 # text 3: 0.5 # number 4: 0.5 # abstract 5: 0.5 # content 6: 0.5 # figure_title 7: 0.3 # formula 8: 0.5 # table 9: 0.5 # table_title 10: 0.5 # reference 11: 0.5 # doc_title 12: 0.5 # footnote 13: 0.5 # header 14: 0.5 # algorithm 15: 0.5 # footer 16: 0.3 # seal 17: 0.5 # chart_title 18: 0.5 # chart 19: 0.5 # formula_number 20: 0.5 # header_image 21: 0.5 # footer_image 22: 0.5 # aside_text layout_nms: True layout_unclip_ratio: 0: [1.0, 1.0] # paragraph_title 1: [1.0, 1.0] # image 2: [1.0, 1.0] # text 3: [1.0, 1.0] # number 4: [1.0, 1.0] # abstract 5: [1.0, 1.0] # content 6: [1.0, 1.0] # figure_title 7: [1.0, 1.0] # formula 8: [1.0, 1.0] # table 9: [1.0, 1.0] # table_title 10: [1.0, 1.0] # reference 11: [1.0, 1.0] # doc_title 12: [1.0, 1.0] # footnote 13: [1.0, 1.0] # header 14: [1.0, 1.0] # algorithm 15: [1.0, 1.0] # footer 16: [1.0, 1.0] # seal 17: [1.0, 1.0] # chart_title 18: [1.0, 1.0] # chart 19: [1.0, 1.0] # formula_number 20: [1.0, 1.0] # header_image 21: [1.0, 1.0] # footer_image 22: [1.0, 1.0] # aside_text layout_merge_bboxes_mode: 0: "large" # paragraph_title 1: "large" # image 2: "union" # text 3: "union" # number 4: "union" # abstract 5: "union" # content 6: "union" # figure_title 7: "large" # formula 8: "union" # table 9: "union" # table_title 10: "union" # reference 11: "union" # doc_title 12: "union" # footnote 13: "union" # header 14: "union" # algorithm 15: "union" # footer 16: "union" # seal 17: "union" # chart_title 18: "large" # chart 19: "union" # formula_number 20: "union" # header_image 21: "union" # footer_image 22: "union" # aside_text

SubPipelines: DocPreprocessor: pipeline_name: doc_preprocessor use_doc_orientation_classify: True use_doc_unwarping: True SubModules: DocOrientationClassify: module_name: doc_text_orientation model_name: PP-LCNet_x1_0_doc_ori model_dir: /home/szp/.paddlex/official_models/PP-LCNet_x1_0_doc_ori DocUnwarping: module_name: image_unwarping model_name: UVDoc model_dir: /home/szp/.paddlex/official_models/UVDoc

GeneralOCR: pipeline_name: OCR text_type: general use_doc_preprocessor: False use_textline_orientation: True SubModules: TextDetection: module_name: text_detection model_name: PP-OCRv4_server_det model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_det limit_side_len: 736 limit_type: min thresh: 0.3 box_thresh: 0.6 unclip_ratio: 1.5 TextLineOrientation: module_name: textline_orientation model_name: PP-LCNet_x0_25_textline_ori model_dir: /home/szp/.paddlex/official_models/PP-LCNet_x0_25_textline_ori batch_size: 1 TextRecognition: module_name: text_recognition model_name: PP-OCRv4_server_rec_doc model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_rec_doc batch_size: 6 score_thresh: 0.0

TableRecognition: pipeline_name: table_recognition_v2 use_layout_detection: False use_doc_preprocessor: False use_ocr_model: False SubModules:
TableClassification: module_name: table_classification model_name: PP-LCNet_x1_0_table_cls model_dir: /home/szp/.paddlex/official_models/PP-LCNet_x1_0_table_cls

  WiredTableStructureRecognition:
    module_name: table_structure_recognition
    model_name: SLANeXt_wired
    model_dir: /home/szp/.paddlex/official_models/SLANeXt_wired
  
  WirelessTableStructureRecognition:
    module_name: table_structure_recognition
    model_name: SLANet_plus
    model_dir: /home/szp/.paddlex/official_models/SLANet_plus
  
  WiredTableCellsDetection:
    module_name: table_cells_detection
    model_name: RT-DETR-L_wired_table_cell_det
    model_dir: /home/szp/.paddlex/official_models/RT-DETR-L_wired_table_cell_det
  
  WirelessTableCellsDetection:
    module_name: table_cells_detection
    model_name: RT-DETR-L_wireless_table_cell_det
    model_dir: /home/szp/.paddlex/official_models/RT-DETR-L_wireless_table_cell_det
SubPipelines:
  GeneralOCR:
    pipeline_name: OCR
    text_type: general
    use_doc_preprocessor: False
    use_textline_orientation: True
    SubModules:
      TextDetection:
        module_name: text_detection
        model_name: PP-OCRv4_server_det
        model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_det
        limit_side_len: 736
        limit_type: min
        thresh: 0.3
        box_thresh: 0.4
        unclip_ratio: 2.0
      TextLineOrientation:
        module_name: textline_orientation
        model_name: PP-LCNet_x0_25_textline_ori
        model_dir: /home/szp/.paddlex/official_models/PP-LCNet_x0_25_textline_ori
        batch_size: 1 
      TextRecognition:
        module_name: text_recognition
        model_name: PP-OCRv4_server_rec_doc
        model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_rec_doc
        batch_size: 6
    score_thresh: 0.0

SealRecognition: pipeline_name: seal_recognition use_layout_detection: False use_doc_preprocessor: False SubPipelines: SealOCR: pipeline_name: OCR text_type: seal use_doc_preprocessor: False use_textline_orientation: False SubModules: TextDetection: module_name: seal_text_detection model_name: PP-OCRv4_server_seal_det model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_seal_det limit_side_len: 736 limit_type: min thresh: 0.2 box_thresh: 0.6 unclip_ratio: 0.5 TextRecognition: module_name: text_recognition model_name: PP-OCRv4_server_rec model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_rec batch_size: 1 score_thresh: 0

FormulaRecognition: pipeline_name: formula_recognition use_layout_detection: False use_doc_preprocessor: False SubModules: FormulaRecognition: module_name: formula_recognition model_name: PP-FormulaNet-L model_dir: null batch_size: 5

  1. 您使用的模型数据集是? 详见YAML文件
  2. 请提供您出现的报错信息及相关log 在代码中添加DEVICE前: (myenv) szp@szp:~/PP-TableMagic$ python3 ppstructrue.py /home/szp/PP-TableMagic/myenv/lib/python3.12/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message)

C++ Traceback (most recent call last):

0 paddle::AnalysisPredictor::ZeroCopyRun(bool) 1 paddle::framework::NaiveExecutor::RunInterpreterCore(std::vector<std::string, std::allocator<std::string > > const&, bool, bool) 2 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool) 3 paddle::framework::PirInterpreter::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool) 4 paddle::framework::PirInterpreter::TraceRunImpl() 5 paddle::framework::PirInterpreter::TraceRunInstructionList(std::vector<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase >, std::allocator<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase > > > const&) 6 paddle::framework::PirInterpreter::RunInstructionBase(paddle::framework::InstructionBase*) 7 paddle::framework::PhiKernelInstruction::Run() 8 phi::KernelImpl<void ()(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, phi::DenseTensor), &(void phi::ConvKernel<float, phi::CPUContext>(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, phi::DenseTensor*))>::Compute(phi::KernelContext*) 9 void phi::ConvKernelImpl<float, phi::CPUContext>(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, int, std::vector<int, std::allocator > const&, std::string const&, phi::DenseTensor*) 10 phi::funcs::Im2ColFunctor<(phi::funcs::ColFormat)0, phi::CPUContext, float>::operator()(phi::CPUContext const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, phi::DenseTensor*, common::DataLayout)


Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: *** Aborted at 1744634621 (unix time) try "date -d @1744634621" if you are using GNU date ***] [SignalInfo: *** SIGSEGV (@0x70100ac01260) received by PID 1422562 (TID 0x7018c5dcf080) from PID 180359776 ***]

Segmentation fault (core dumped) 在代码中添加DEVICE后: (myenv) szp@szp:~/PP-TableMagic$ python3 ppstructrue.py /home/szp/PP-TableMagic/myenv/lib/python3.12/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message)


C++ Traceback (most recent call last):

0 paddle::AnalysisPredictor::ZeroCopyRun(bool) 1 paddle::framework::NaiveExecutor::RunInterpreterCore(std::vector<std::string, std::allocator<std::string > > const&, bool, bool) 2 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool) 3 paddle::framework::PirInterpreter::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool) 4 paddle::framework::PirInterpreter::TraceRunImpl() 5 paddle::framework::PirInterpreter::TraceRunInstructionList(std::vector<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase >, std::allocator<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase > > > const&) 6 paddle::framework::PirInterpreter::RunInstructionBase(paddle::framework::InstructionBase*) 7 paddle::framework::PhiKernelInstruction::Run() 8 phi::KernelImpl<void ()(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, phi::DenseTensor), &(void phi::ConvKernel<float, phi::CPUContext>(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, std::vector<int, std::allocator > const&, int, std::string const&, phi::DenseTensor*))>::Compute(phi::KernelContext*) 9 void phi::ConvKernelImpl<float, phi::CPUContext>(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::string const&, int, std::vector<int, std::allocator > const&, std::string const&, phi::DenseTensor*) 10 phi::funcs::Im2ColFunctor<(phi::funcs::ColFormat)0, phi::CPUContext, float>::operator()(phi::CPUContext const&, phi::DenseTensor const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, phi::DenseTensor*, common::DataLayout)


Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: *** Aborted at 1744634798 (unix time) try "date -d @1744634798" if you are using GNU date ***] [SignalInfo: *** SIGSEGV (@0x76c0c5201260) received by PID 1429710 (TID 0x76c980679080) from PID 18446744072721797728 ***]

Segmentation fault (core dumped)

环境

  1. 请提供您使用的PaddlePaddle、PaddleX版本号、Python版本号 Python 版本号: 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0] Python 主版本号: 3.12.3 /home/szp/PP-TableMagic/myenv/lib/python3.12/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message) PaddlePaddle 版本号: 3.0.0-rc0 PaddleX 版本号: 3.0.0.rc0
  2. 请提供您使用的操作系统信息,如Linux/Windows/MacOS No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 24.04.1 LTS Release: 24.04 Codename: noble
  3. 请问您使用的CUDA/cuDNN的版本号是? 当前使用 CPU 版本,无需 CUDA/cuDNN 加速库。

hellomrsu avatar Apr 14 '25 13:04 hellomrsu