PaddleOCR 使用官方的VLLM部署以后出现识别位置不正确的问题

🔎 Search before asking

[x] I have searched the PaddleOCR Docs and found no similar bug report.
[x] I have searched the PaddleOCR Issues and found no similar bug report.
[x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

这个是识别后的图片的ID位置显示出来的MD如下： # 快速上手文档

在飞书文档里，可以 @同事、发表评论，也可以给文档点赞。

一、认识文档

飞书文档还支持主流 Markdown 功能和丰富 ☐ 键键，弹指间完成你想要的操作。

本文阅读时长：11分钟

飞书文档是可多人实时编辑的在线文档，也是丰富的创作工具。飞书文档让创作更自由，协作更高效。

飞书文档支持多人、多设备同时编辑一篇文档，内容自动保存在云端，无需来回发送文件或手动保存。

飞书文档支持插入图片、表格、视频、文件、画板、高亮块、代码块、投票等丰富内容，也可以嵌入西瓜视频、抖音、哔哩哔哩、Figma等网页。

可以发现发现似乎位置是不正确的使用官方的API是正确的本地部署的不正确不知道是什么原因

🏃‍♂️ Environment (运行环境)

云端A800服务器部署正常运行推理服务启动代码： paddlex_genai_server
--model_name PaddleOCR-VL-0.9B
--model_dir /renxiangganzhi/taokerui/tkr/PaddleOCR-VL
--host 0.0.0.0
--port 8118
--backend vllm

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

import os import sys from pathlib import Path from paddleocr import PaddleOCRVL

def process_file(input_path: str, output_dir: str = "output"): input_path = Path(input_path) output_dir = Path(output_dir) output_dir.mkdir(parents=True, exist_ok=True)

if not input_path.exists():
    raise FileNotFoundError(f"输入文件不存在: {input_path}")

# 初始化 PaddleOCR-VL 客户端，指向远程推理服务
pipeline = PaddleOCRVL(
    vl_rec_backend="vllm-server",
    vl_rec_server_url="http://192.168.31.102:8118/v1",
    format_block_content=True

)

print(f"正在处理文件: {input_path}")
results = pipeline.predict(input=str(input_path))

# 处理每一页结果
markdown_pages = []
all_markdown_images = []

for idx, res in enumerate(results):
    print(f"\n--- 第 {idx + 1} 页结果预览 ---")
    res.print()

    # 保存单页 JSON
    json_path = output_dir / f"page_{idx + 1}.json"
    res.save_to_json(save_path=str(json_path))
    print(f"✅ JSON 已保存: {json_path}")

    # 收集 Markdown 内容（用于合并 PDF）
    md_info = res.markdown
    markdown_pages.append(md_info)
    all_markdown_images.append(md_info.get("markdown_images", {}))

# 如果是 PDF，合并为一个 Markdown 文件
if input_path.suffix.lower() == ".pdf":
    full_markdown = pipeline.concatenate_markdown_pages(markdown_pages)
    md_file = output_dir / f"{input_path.stem}.md"
    with open(md_file, "w", encoding="utf-8") as f:
        f.write(full_markdown)
    print(f"✅ 合并后的 Markdown 已保存: {md_file}")

    # 保存 Markdown 中引用的图片
    for img_dict in all_markdown_images:
        if img_dict:
            for rel_path, pil_img in img_dict.items():
                img_save_path = output_dir / rel_path
                img_save_path.parent.mkdir(parents=True, exist_ok=True)
                pil_img.save(img_save_path)
                print(f"✅ Markdown 图片已保存: {img_save_path}")
else:
    # 单图直接保存 Markdown
    if markdown_pages:
        md_text = markdown_pages[0]["markdown_texts"]
        md_file = output_dir / f"{input_path.stem}.md"
        with open(md_file, "w", encoding="utf-8") as f:
            f.write(md_text)
        print(f"✅ Markdown 已保存: {md_file}")

        # 保存图片（如果有）
        img_dict = all_markdown_images[0]
        for rel_path, pil_img in img_dict.items():
            img_save_path = output_dir / rel_path
            img_save_path.parent.mkdir(parents=True, exist_ok=True)
            pil_img.save(img_save_path)
            print(f"✅ Markdown 图片已保存: {img_save_path}")

if name == "main": # if len(sys.argv) < 2: # print("用法: python ocr_vl_client.py <图片或PDF路径> [输出目录]") # sys.exit(1)

input_file = r"22234.png"
output_folder = r"out"

# try:
process_file(input_file, output_folder)
print("\n🎉 处理完成！")
# except Exception as e:
#     print(f"❌ 处理失败: {e}")
#     sys.exit(1)