MinerU icon indicating copy to clipboard operation
MinerU copied to clipboard

相同段落的文本识别成两个段落

Open freedom1993 opened this issue 1 year ago • 3 comments

Description of the bug | 错误描述

原始文本 image

解析后文本**** image

How to reproduce the bug | 如何复现

magic-pdf pdf-command --pdf agents.pdf --inside_model true --method ocr

Operating system | 操作系统

Windows

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.6.x

Device mode | 设备模式

cpu

freedom1993 avatar Aug 01 '24 09:08 freedom1993

@freedom1993 please upload your pdf file

drunkpig avatar Aug 01 '24 10:08 drunkpig

referendes has the same problem original text image

parsed text image

freedom1993 avatar Aug 01 '24 11:08 freedom1993

image image fixed in latest version

myhloli avatar Jan 05 '25 14:01 myhloli