ragflow
ragflow copied to clipboard
[Bug]: Some PDF files lost the content.
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch name
main
Commit ID
c3b2a1
Other environment information
部分pdf文件解析缺少内容
Actual behavior
部分pdf文件解析缺少内容
Expected behavior
部分pdf文件解析缺少内容
Steps to reproduce
部分pdf文件解析缺少内容
Additional information
部分pdf文件解析缺少内容 第五册 1-50页[4].pdf 有的页缺少标题 ,比如海蛇药酒 那一页,还有Qiongyu Gao ,, Jiangfan Wan 我在解析文件的时候使用的Presentation 方式
还有一个问题解析之后 处方哪里丢失了很多内容,麻烦看一下
My suggestion is that adjust task size and chunk size.
The effect of this file is poor when General is used, so the Presentation used is better, but there are still some individual errors