ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: Can't parse PDF file

Open xxz196801 opened this issue 1 year ago • 5 comments

Describe your problem

image 如图所示,在上传一份图文混排的PDF文件时,出现了错误。这个问题,一直存在,我从8.0开始就一直都有。

xxz196801 avatar Oct 09 '24 12:10 xxz196801

We intend to create an international community, so we encourage using English for communication.

JinHai-CN avatar Oct 09 '24 12:10 JinHai-CN

Would you please attach this file ? @xxz196801

JinHai-CN avatar Oct 09 '24 12:10 JinHai-CN

安全评价 (曹庆贵) (Z-Library).pdf 其他pdf文件都能正确识别,说心里话,我测试过几乎所有知识库,这个是最好的。所以,从8.0就一直在测试,谢谢你们的辛勤付出!!

xxz196801 avatar Oct 10 '24 00:10 xxz196801

We use pdfplumber to open a PDF file. pdfplumber is an open source lib. You can use 'fitz.open(fnm)' to open this PDF. fitz is more robust, but the licence is not for open source project. image

KevinHuSh avatar Oct 10 '24 01:10 KevinHuSh

PDF.fitz is another open source module under AGPL.

JinHai-CN avatar Oct 10 '24 02:10 JinHai-CN