PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

运行PP-StructureV3报错,文本方向检测有问题

Open tsaican opened this issue 3 months ago • 1 comments

🔎 Search before asking

  • [x] I have searched the PaddleOCR Docs and found no similar bug report.
  • [x] I have searched the PaddleOCR Issues and found no similar bug report.
  • [x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

报错信息如下: Set use_doc_orientation_classify, but the model for doc orientation classify is not initialized. Traceback (most recent call last): File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/yyx/embedding_reranker_qwen3_test copy.py", line 10, in output = pipeline.predict( File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddleocr/_pipelines/pp_structurev3.py", line 250, in predict return list( File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/_parallel.py", line 129, in predict yield from self._pipeline.predict( File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/layout_parsing/pipeline_v2.py", line 993, in predict doc_preprocessor_results = list( File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/_parallel.py", line 129, in predict yield from self._pipeline.predict( File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/doc_preprocessor/pipeline.py", line 162, in predict preds = list(self.doc_ori_classify_model(image_arrays)) AttributeError: '_DocPreprocessorPipeline' object has no attribute 'doc_ori_classify_model'

🏃‍♂️ Environment (运行环境)

Ubuntu 22.04 Python 3.10 Paddle 3.0.0

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

from paddleocr import PPStructureV3

input_file = "..."

pipeline = PPStructureV3() output = pipeline.predict( input=input_file, use_doc_orientation_classify = True, use_doc_unwarping = True, doc_orientation_classify_model_name="PP-LCNet_x1_0_doc_ori" )

markdown_list = []

for res in output: md_info = res.markdown markdown_list.append(md_info)

markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)

tsaican avatar Sep 24 '25 10:09 tsaican

pipeline = PaddleOCRVL( vl_rec_backend="vllm-server", vl_rec_server_url="http://127.0.0.1:2024/v1", use_layout_detection=False, layout_detection_model_name="PP-DocLayoutV2", layout_detection_model_dir="/models/PaddleOCR-VL/PP-DocLayoutV2", ) ### 试一下这个,应该可以

🔎 Search before asking

  • [x] I have searched the PaddleOCR Docs and found no similar bug report.我搜索了 PaddleOCR 文档 ,没有找到类似的错误报告。[x] I have searched the PaddleOCR Issues and found no similar bug report.我搜索了 PaddleOCR 问题 ,没有发现类似的错误报告。[x] I have searched the PaddleOCR Discussions and found no similar bug report.我搜索了 PaddleOCR 讨论 ,没有找到类似的错误报告。

🐛 Bug (问题描述)

报错信息如下: Set use_doc_orientation_classify, but the model for doc orientation classify is not initialized.设置 use_doc_orientation_classify,但文档方向分类的模型未初始化。 Traceback (most recent call last):回溯(最近一次调用最后): File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/yyx/embedding_reranker_qwen3_test copy.py", line 10, in 文件“/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/yyx/embedding_reranker_qwen3_test copy.py”,第 10 行,在 output = pipeline.predict(输出 = pipeline.predict( File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddleocr/_pipelines/pp_structurev3.py", line 250, in predict文件“/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddleocr/_pipelines/pp_structurev3.py”,第 250 行,在 predict 中 return list(  返回列表( File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/_parallel.py", line 129, in predict文件“/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/_parallel.py”,第 129 行,在 predict 中 yield from self._pipeline.predict(yield 来自 self._pipeline.predict( File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/layout_parsing/pipeline_v2.py", line 993, in predict文件“/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/layout_parsing/pipeline_v2.py”,第 993 行,在 predict 中 doc_preprocessor_results = list(doc_preprocessor_results = 列表( File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/_parallel.py", line 129, in predict文件“/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/_parallel.py”,第 129 行,在 predict 中 yield from self._pipeline.predict(yield 来自 self._pipeline.predict( File "/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/doc_preprocessor/pipeline.py", line 162, in predict文件“/HOME/sysucc_huiyanluo/sysucc_huiyanluo_1/HDD_POOL/miniconda3/envs/gj_title/lib/python3.10/site-packages/paddlex/inference/pipelines/doc_preprocessor/pipeline.py”,第 162 行,在 predict 中 preds = list(self.doc_ori_classify_model(image_arrays))preds = 列表(self.doc_ori_classify_model(image_arrays)) AttributeError: '_DocPreprocessorPipeline' object has no attribute 'doc_ori_classify_model'AttributeError: '_DocPreprocessorPipeline' 对象没有属性 'doc_ori_classify_model'

🏃‍♂️ Environment (运行环境)

Ubuntu 22.04  Ubuntu 22.04 的 Python 3.10  Python 3.10 中文文档 Paddle 3.0.0  桨 3.0.0

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

from paddleocr import PPStructureV3来自 paddleocr import PPStructureV3

input_file = "..."  input_file = “...”

pipeline = PPStructureV3()流水线 = PPStructureV3() output = pipeline.predict(输出 = pipeline.predict( input=input_file,  输入=input_file, use_doc_orientation_classify = True,use_doc_orientation_classify = 真, use_doc_unwarping = True,use_doc_unwarping = True, doc_orientation_classify_model_name="PP-LCNet_x1_0_doc_ori"doc_orientation_classify_model_name=“PP-LCNet_x1_0_doc_ori” )

markdown_list = []

for res in output:  对于输出中的 res: md_info = res.markdown markdown_list.append(md_info)markdown_list.append(md_info)

markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)markdown_texts = pipeline.concatenate_markdown_pages(markdown_list)

试下如下代码,应该可以:

pipeline = PaddleOCRVL(
        vl_rec_backend="vllm-server",
        vl_rec_server_url="http://127.0.0.1:2024/v1",
        use_layout_detection=False,
        layout_detection_model_name="PP-DocLayoutV2",
        layout_detection_model_dir="/models/PaddleOCR-VL/PP-DocLayoutV2",
    )

tao-xiaoxin avatar Nov 03 '25 09:11 tao-xiaoxin