PaddleX
PaddleX copied to clipboard
PP-STRUCTRUE -V3无法本地部署
Checklist:
- [ ] 查找历史相关issue寻求解答
- [ ] 翻阅FAQ
- [ ] 翻阅PaddleX 文档
- [ ] 确认bug是否在新版本里还未修复
描述问题
我打算本地服务器部署PPSTRUCTRUE-V3(但没有服务化部署或高性能部署),用于将复杂的PDF表格转换成JSON文件,然后根据识别的TEXTBOX的坐标和内容提取关键信息。现在遇到无法运行PPSTRUCTRUE-V3的问题,但是能运行PP-CHATOCR-V4。
复现
1.python代码: from paddlex import create_pipeline
pipeline = create_pipeline(pipeline="/home/szp/PP-TableMagic/PP-StructureV3.yaml",device="cpu")
output = pipeline.predict( input="/home/szp/PP-TableMagic/input/1/2/TBJU7972437guolianyundan.png",device="cpu", ) for res in output: res.print() ## 打印预测的结构化输出 res.save_to_json(save_path="output") ## 保存当前图像的结构化json结果 res.save_to_markdown(save_path="output") ## 保存当前图像的markdown格式的结果 2.YAML配置文件:
pipeline_name: PP-StructureV3
use_doc_preprocessor: True use_general_ocr: True use_seal_recognition: True use_table_recognition: True use_formula_recognition: False
SubModules: LayoutDetection: module_name: layout_detection model_name: PP-DocLayout-L model_dir: /home/szp/.paddlex/official_models/PP-DocLayout-L threshold: 0: 0.3 # paragraph_title 1: 0.5 # image 2: 0.5 # text 3: 0.5 # number 4: 0.5 # abstract 5: 0.5 # content 6: 0.5 # figure_title 7: 0.3 # formula 8: 0.5 # table 9: 0.5 # table_title 10: 0.5 # reference 11: 0.5 # doc_title 12: 0.5 # footnote 13: 0.5 # header 14: 0.5 # algorithm 15: 0.5 # footer 16: 0.3 # seal 17: 0.5 # chart_title 18: 0.5 # chart 19: 0.5 # formula_number 20: 0.5 # header_image 21: 0.5 # footer_image 22: 0.5 # aside_text layout_nms: True layout_unclip_ratio: 0: [1.0, 1.0] # paragraph_title 1: [1.0, 1.0] # image 2: [1.0, 1.0] # text 3: [1.0, 1.0] # number 4: [1.0, 1.0] # abstract 5: [1.0, 1.0] # content 6: [1.0, 1.0] # figure_title 7: [1.0, 1.0] # formula 8: [1.0, 1.0] # table 9: [1.0, 1.0] # table_title 10: [1.0, 1.0] # reference 11: [1.0, 1.0] # doc_title 12: [1.0, 1.0] # footnote 13: [1.0, 1.0] # header 14: [1.0, 1.0] # algorithm 15: [1.0, 1.0] # footer 16: [1.0, 1.0] # seal 17: [1.0, 1.0] # chart_title 18: [1.0, 1.0] # chart 19: [1.0, 1.0] # formula_number 20: [1.0, 1.0] # header_image 21: [1.0, 1.0] # footer_image 22: [1.0, 1.0] # aside_text layout_merge_bboxes_mode: 0: "large" # paragraph_title 1: "large" # image 2: "union" # text 3: "union" # number 4: "union" # abstract 5: "union" # content 6: "union" # figure_title 7: "large" # formula 8: "union" # table 9: "union" # table_title 10: "union" # reference 11: "union" # doc_title 12: "union" # footnote 13: "union" # header 14: "union" # algorithm 15: "union" # footer 16: "union" # seal 17: "union" # chart_title 18: "large" # chart 19: "union" # formula_number 20: "union" # header_image 21: "union" # footer_image 22: "union" # aside_text
SubPipelines: DocPreprocessor: pipeline_name: doc_preprocessor use_doc_orientation_classify: True use_doc_unwarping: True SubModules: DocOrientationClassify: module_name: doc_text_orientation model_name: PP-LCNet_x1_0_doc_ori model_dir: /home/szp/.paddlex/official_models/PP-LCNet_x1_0_doc_ori DocUnwarping: module_name: image_unwarping model_name: UVDoc model_dir: /home/szp/.paddlex/official_models/UVDoc
GeneralOCR: pipeline_name: OCR text_type: general use_doc_preprocessor: False use_textline_orientation: True SubModules: TextDetection: module_name: text_detection model_name: PP-OCRv4_server_det model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_det limit_side_len: 736 limit_type: min thresh: 0.3 box_thresh: 0.6 unclip_ratio: 1.5 TextLineOrientation: module_name: textline_orientation model_name: PP-LCNet_x0_25_textline_ori model_dir: /home/szp/.paddlex/official_models/PP-LCNet_x0_25_textline_ori batch_size: 1 TextRecognition: module_name: text_recognition model_name: PP-OCRv4_server_rec_doc model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_rec_doc batch_size: 6 score_thresh: 0.0
TableRecognition:
pipeline_name: table_recognition_v2
use_layout_detection: False
use_doc_preprocessor: False
use_ocr_model: False
SubModules:
TableClassification:
module_name: table_classification
model_name: PP-LCNet_x1_0_table_cls
model_dir: /home/szp/.paddlex/official_models/PP-LCNet_x1_0_table_cls
WiredTableStructureRecognition:
module_name: table_structure_recognition
model_name: SLANeXt_wired
model_dir: /home/szp/.paddlex/official_models/SLANeXt_wired
WirelessTableStructureRecognition:
module_name: table_structure_recognition
model_name: SLANet_plus
model_dir: /home/szp/.paddlex/official_models/SLANet_plus
WiredTableCellsDetection:
module_name: table_cells_detection
model_name: RT-DETR-L_wired_table_cell_det
model_dir: /home/szp/.paddlex/official_models/RT-DETR-L_wired_table_cell_det
WirelessTableCellsDetection:
module_name: table_cells_detection
model_name: RT-DETR-L_wireless_table_cell_det
model_dir: /home/szp/.paddlex/official_models/RT-DETR-L_wireless_table_cell_det
SubPipelines:
GeneralOCR:
pipeline_name: OCR
text_type: general
use_doc_preprocessor: False
use_textline_orientation: True
SubModules:
TextDetection:
module_name: text_detection
model_name: PP-OCRv4_server_det
model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_det
limit_side_len: 736
limit_type: min
thresh: 0.3
box_thresh: 0.4
unclip_ratio: 2.0
TextLineOrientation:
module_name: textline_orientation
model_name: PP-LCNet_x0_25_textline_ori
model_dir: /home/szp/.paddlex/official_models/PP-LCNet_x0_25_textline_ori
batch_size: 1
TextRecognition:
module_name: text_recognition
model_name: PP-OCRv4_server_rec_doc
model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_rec_doc
batch_size: 6
score_thresh: 0.0
SealRecognition: pipeline_name: seal_recognition use_layout_detection: False use_doc_preprocessor: False SubPipelines: SealOCR: pipeline_name: OCR text_type: seal use_doc_preprocessor: False use_textline_orientation: False SubModules: TextDetection: module_name: seal_text_detection model_name: PP-OCRv4_server_seal_det model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_seal_det limit_side_len: 736 limit_type: min thresh: 0.2 box_thresh: 0.6 unclip_ratio: 0.5 TextRecognition: module_name: text_recognition model_name: PP-OCRv4_server_rec model_dir: /home/szp/.paddlex/official_models/PP-OCRv4_server_rec batch_size: 1 score_thresh: 0
FormulaRecognition: pipeline_name: formula_recognition use_layout_detection: False use_doc_preprocessor: False SubModules: FormulaRecognition: module_name: formula_recognition model_name: PP-FormulaNet-L model_dir: null batch_size: 5
- 您使用的模型和数据集是? 详见YAML文件
- 请提供您出现的报错信息及相关log 在代码中添加DEVICE前: (myenv) szp@szp:~/PP-TableMagic$ python3 ppstructrue.py /home/szp/PP-TableMagic/myenv/lib/python3.12/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message)
C++ Traceback (most recent call last):
0 paddle::AnalysisPredictor::ZeroCopyRun(bool)
1 paddle::framework::NaiveExecutor::RunInterpreterCore(std::vector<std::string, std::allocator<std::string > > const&, bool, bool)
2 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool)
3 paddle::framework::PirInterpreter::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool)
4 paddle::framework::PirInterpreter::TraceRunImpl()
5 paddle::framework::PirInterpreter::TraceRunInstructionList(std::vector<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase >, std::allocator<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase > > > const&)
6 paddle::framework::PirInterpreter::RunInstructionBase(paddle::framework::InstructionBase*)
7 paddle::framework::PhiKernelInstruction::Run()
8 phi::KernelImpl<void ()(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator
Error Message Summary:
FatalError: Segmentation fault is detected by the operating system.
[TimeInfo: *** Aborted at 1744634621 (unix time) try "date -d @1744634621" if you are using GNU date ***]
[SignalInfo: *** SIGSEGV (@0x70100ac01260) received by PID 1422562 (TID 0x7018c5dcf080) from PID 180359776 ***]
Segmentation fault (core dumped) 在代码中添加DEVICE后: (myenv) szp@szp:~/PP-TableMagic$ python3 ppstructrue.py /home/szp/PP-TableMagic/myenv/lib/python3.12/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message)
C++ Traceback (most recent call last):
0 paddle::AnalysisPredictor::ZeroCopyRun(bool)
1 paddle::framework::NaiveExecutor::RunInterpreterCore(std::vector<std::string, std::allocator<std::string > > const&, bool, bool)
2 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool)
3 paddle::framework::PirInterpreter::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool, bool)
4 paddle::framework::PirInterpreter::TraceRunImpl()
5 paddle::framework::PirInterpreter::TraceRunInstructionList(std::vector<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase >, std::allocator<std::unique_ptr<paddle::framework::InstructionBase, std::default_deletepaddle::framework::InstructionBase > > > const&)
6 paddle::framework::PirInterpreter::RunInstructionBase(paddle::framework::InstructionBase*)
7 paddle::framework::PhiKernelInstruction::Run()
8 phi::KernelImpl<void ()(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, std::vector<int, std::allocator
Error Message Summary:
FatalError: Segmentation fault is detected by the operating system.
[TimeInfo: *** Aborted at 1744634798 (unix time) try "date -d @1744634798" if you are using GNU date ***]
[SignalInfo: *** SIGSEGV (@0x76c0c5201260) received by PID 1429710 (TID 0x76c980679080) from PID 18446744072721797728 ***]
Segmentation fault (core dumped)
环境
- 请提供您使用的PaddlePaddle、PaddleX版本号、Python版本号 Python 版本号: 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0] Python 主版本号: 3.12.3 /home/szp/PP-TableMagic/myenv/lib/python3.12/site-packages/paddle/utils/cpp_extension/extension_utils.py:711: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message) PaddlePaddle 版本号: 3.0.0-rc0 PaddleX 版本号: 3.0.0.rc0
- 请提供您使用的操作系统信息,如Linux/Windows/MacOS No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 24.04.1 LTS Release: 24.04 Codename: noble
- 请问您使用的CUDA/cuDNN的版本号是? 当前使用 CPU 版本,无需 CUDA/cuDNN 加速库。