PaddleOCR paddleocr-vl 解析顺序有误

🔎 Search before asking

[x] I have searched the PaddleOCR Docs and found no similar bug report.
[x] I have searched the PaddleOCR Issues and found no similar bug report.
[x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

使用paddleocr-vl解析PDF文件，发现解析结果顺序错误较多，保存版面排序的可视化结果如下：

请问是哪个环节出现了问题？

🏃‍♂️ Environment (运行环境)

python 3.12.3 paddleocr 3.3.0 paddlepaddle 3.2.0 paddlex 3.3.3

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

python调用：

from paddleocr import PaddleOCRVL
import glob

pipeline = PaddleOCRVL(layout_detection_model_dir="models/PaddleOCR-VL/PP-DocLayoutV2", vl_rec_backend="vllm-server", vl_rec_server_url="fake_url", use_chart_recognition=True)
image_files = glob.glob("sources/pdf_parse/*.pdf")
for file in image_files:
    output = pipeline.predict(file)
    for res in output:
        res.print()
        res.save_to_json(save_path="output")
        res.save_to_markdown(save_path="output")

命令行调用：

paddleocr doc_parser  -i sources/pdf/1.pdf --vl_rec_backend vllm-server   --vl_rec_server_url fake_url --save_path output

Oct 21 '25 09:10 Nancis1130

是否方便贴一下原图呢？

Oct 21 '25 10:10 cuicheng01

@cuicheng01

Oct 21 '25 10:10 Nancis1130

在官方的 demo 中，预测顺序是正确的

Oct 22 '25 07:10 leo-q8

在官方的 demo 中，预测顺序是正确的

我也遇到了类似的问题,demo的版面效果好像和自己部署的有差异？

Oct 22 '25 08:10 xioatian1

@leo-q8 @xioatian1 换了环境后，我已经解决了这个问题，现在的解析结果是正确的了。不知道是否是环境的问题，给出前后所使用的环境以供参考。

解析错误的环境： cuda12.2 python3.12 torch2.6.0

解析正确的环境： cuda12.4 python3.10 torch2.7.0

其他的所有依赖都一致，见1楼。

Oct 22 '25 08:10 Nancis1130

@leo-q8 @xioatian1 换了环境后，我已经解决了这个问题，现在的解析结果是正确的了。不知道是否是环境的问题，给出前后所使用的环境以供参考。

解析错误的环境： cuda12.2 python3.12 torch2.6.0

解析正确的环境： cuda12.4 python3.10 torch2.7.0

其他的所有依赖都一致，见1楼。

我用的是官方docker,应该不存在环境问题

Oct 22 '25 08:10 xioatian1

+1

Oct 23 '25 07:10 wangsrGit119

+1

同用的官方镜像

Oct 23 '25 07:10 wangsrGit119

+1 ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlex-genai-vllm-server镜像

Oct 24 '25 04:10 lpdswing

大家先确认一下顺序的结果是否能跟官方的 demo 对齐，如果能对齐，那就是版面顺序模型确实存在一些不好解决的 corner case。如果不能跟官方的 demo 对齐，那可能推理环境的问题

Oct 24 '25 07:10 leo-q8

大家先确认一下顺序的结果是否能跟官方的 demo 对齐，如果能对齐，那就是版面顺序模型确实存在一些不好解决的 corner case。如果不能跟官方的 demo 对齐，那可能推理环境的问题

最开始那个人家不同的cuda环境都能跑出来不一样的顺序。我这边版面模型运行直接在cpu上也是乱序的。但是图片上传到官方aistudio的demo 解析是没问题的，很简单的俩行demo代码，加载指定 vllm 然后 predict

Oct 24 '25 07:10 wangsrGit119

感谢反馈！我发现使用CPU推理精度异常，我们会高优排查此问题，尽快给出回复。

Oct 24 '25 08:10 TingquanGao

@Nancis1130 感谢反馈问题！想问下您本地推理是否使用的官方docker镜像，以及GPU型号和CUDA版本是多少？这对我们排查问题非常重要。

Oct 24 '25 08:10 TingquanGao

我没有使用官方镜像，a800，cu12.2

---- 回复的原邮件 ---- | 发件人 | Tingquan @.> | | 发送日期 | 2025年10月24日 16:49 | | 收件人 | PaddlePaddle/PaddleOCR @.> | | 抄送人 | Nancis1130 @.>, Mention @.> | | 主题 | Re: [PaddlePaddle/PaddleOCR] paddleocr-vl 解析顺序有误 (Issue #16766) | TingquanGao left a comment (PaddlePaddle/PaddleOCR#16766)

@Nancis1130 感谢反馈问题！想问下您本地推理是否使用的官方docker镜像，以及GPU型号和CUDA版本是多少？这对我们排查问题非常重要。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Oct 25 '25 01:10 Nancis1130

在python:3.10.6的基础上重新构建了gpu环境的镜像，cuda version: 13.0, 解析顺序正常了，再吐槽一下离线环境使用体验太差了，需要下载模型，字体什么的建议再文档中补全，客户端调用vlm服务依赖太重了

Oct 27 '25 06:10 lpdswing

解析表格文本内容错乱了，我也是。我是用vllm部署的官网提供的镜像。

Oct 27 '25 07:10 Fanzaijun

插眼，也是遇到同样的问题了，用的官方的镜像：环境： CUDA Version: 12.8 python: 3.11 GPU: A100

Oct 30 '25 14:10 wayne-z-zhang-rsp

在python:3.10.6的基础上重新构建了gpu环境的镜像，cuda version: 13.0, 解析顺序正常了，再吐槽一下离线环境使用体验太差了，需要下载模型，字体什么的建议再文档中补全，客户端调用vlm服务依赖太重了

hi，兄弟，可以再告诉一下详细的环境吗？比如torch版本、paddlepaddle版本、paddleocr版本、paddlex版本

Nov 02 '25 14:11 marsh312

@leo-q8 @xioatian1 换了环境后，我已经解决了这个问题，现在的解析结果是正确的了。不知道是否是环境的问题，给出前后所使用的环境以供参考。

解析错误的环境： cuda12.2 python3.12 torch2.6.0

解析正确的环境： cuda12.4 python3.10 torch2.7.0

其他的所有依赖都一致，见1楼。

hi，我想确认一下，你这里是cpu版本的paddle吗？

Nov 02 '25 14:11 marsh312

@marsh312 正确的需要是gpu版本

Nov 03 '25 02:11 Nancis1130

感谢大家的反馈！对于排序错误、本地推理排序和官网demo不一致的问题，目前测试发现：

GPU 推理正常；
CPU 开启 MKLDNN 异常。

由于 MKLDNN 默认开启，可通过以下方式关闭：device="cpu", enable_mkldnn=False。对于 MKLDNN 开启后异常的问题，我们正在排查解决中。如有其他排序错误的问题，可继续留言，我们会持续跟进并解决。

Nov 03 '25 07:11 TingquanGao

在python:3.10.6的基础上重新构建了gpu环境的镜像，cuda version: 13.0, 解析顺序正常了，再吐槽一下离线环境使用体验太差了，需要下载模型，字体什么的建议再文档中补全，客户端调用vlm服务依赖太重了

hi，兄弟，可以再告诉一下详细的环境吗？比如torch版本、paddlepaddle版本、paddleocr版本、paddlex版本

python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ python -m pip install -U "paddleocr[doc-parser]" python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl 没什么特别的，用的官方提供的安装包，除了paddlepaddle指定了版本，其他都是默认版本

Nov 04 '25 08:11 lpdswing

First of all, I would like to express my deep gratitude for creating such a small yet high-performing model.

I am experiencing a same issue.

I am using the vlm docker image, but it is getting the order of sentences wrong when reading them. I have attached an image of the layout results recognized by PaddleOCR. Even though it is a simple single-column document, it is not recognizing the sentence order correctly.

I set the options as follows:

outputs = pipeline.predict(
    input=image_file_paths,
    use_layout_detection=True,
    use_chart_recognition=True,
    use_queues=True,
    use_doc_orientation_classify=False,
)

When I set use_doc_orientation_classify to True, the following error occurs:

"Set use_doc_preprocessor, but the models for doc preprocessor are not initialized."

The non-VLM version (ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddleocr-vl:latest) does not have this error, and use_doc_orientation_classify can be set to True. Naturally, the sentence reading order is also correct. However, it is slow, so I want to use the VLM approach.

I would appreciate it if you could tell me how to resolve this.

Nov 10 '25 14:11 francisggum

PaddleOCR PaddleOCR copied to clipboard

paddleocr-vl 解析顺序有误

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

PaddleOCR
PaddleOCR copied to clipboard