MinerU --method ocr参数的作用是啥？什么场景下需要加这个参数？加这个参数代码片段会被识别成1行，不加的话正常识别原始格式

Description of the bug | 错误描述

原始内容****

加--method ocr参数解析结果

不加--method ocr参数解析结果

How to reproduce the bug | 如何复现

magic-pdf pdf-command --pdf agents.pdf --inside_model true --method ocr

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

cpu

Aug 01 '24 11:08 freedom1993

--method ocr means use paddle to get text from pdf, --method text means use pymuPDF to get text from pdf.

The difference lies in that the bounding boxes obtained by pymupdf may expand irregularly in all directions, covering the surrounding text bounding boxes. This can lead to errors in position calculations. The text bounding boxes obtained through OCR are relatively reliable.

Aug 01 '24 11:08 drunkpig

@freedom1993 We will document this phenomenon you reported as a bug and investigate the root cause.

Aug 01 '24 11:08 drunkpig

@freedom1993 can you provide me this pdf?

Aug 01 '24 11:08 drunkpig

sure A Behavior Language for Story-based Believable Agents.pdf

Aug 02 '24 03:08 freedom1993