ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Bug]: Some PDF files lost the content.

Open YannSuper opened this issue 1 year ago • 3 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Branch name

main

Commit ID

c3b2a1

Other environment information

部分pdf文件解析缺少内容

Actual behavior

部分pdf文件解析缺少内容

Expected behavior

部分pdf文件解析缺少内容

Steps to reproduce

部分pdf文件解析缺少内容

Additional information

部分pdf文件解析缺少内容 第五册 1-50页[4].pdf 有的页缺少标题 ,比如海蛇药酒 那一页,还有Qiongyu Gao ,, Jiangfan Wan 我在解析文件的时候使用的Presentation 方式

YannSuper avatar Jun 04 '24 10:06 YannSuper

还有一个问题解析之后 处方哪里丢失了很多内容,麻烦看一下

YannSuper avatar Jun 04 '24 10:06 YannSuper

My suggestion is that adjust task size and chunk size. image

KevinHuSh avatar Jun 05 '24 03:06 KevinHuSh

The effect of this file is poor when General is used, so the Presentation used is better, but there are still some individual errors

YannSuper avatar Jun 07 '24 03:06 YannSuper