[Bug]: v0.15.1-46 ERROR Book chunk / Pdf object has no attribute page_chars
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
RAGFlow workspace code commit ID
e24af69e96304a0767129b98af3e929cf1c06930
RAGFlow image version
v0.15.1-46-g8674156d slim
Other environment information
Linux 5.15.167.4-microsoft-standard-WSL2
Actual behavior
Hi, When I try to import this file with Book chunking I get error PDF :
My solution at this time is to convert PDF to DOCX...
Kinds regards, David.
09:19:51 Page(121~133): Done (0.45s)
09:19:51 Task has been received.
09:19:51 Page(133~145): OCR started
09:19:54 Page(133~145): [**ERROR]Internal server error while chunking: Pdf object has no attribute page_chars
09:19:54 [ERROR][Exception]: 'Pdf' object has no attribute 'page_chars'**
09:19:54 Task has been received.
09:19:54 Page(145~157): OCR started
09:20:00 Page(145~157): OCR finished (6.40s)
09:20:10 Page(145~157): Layout analysis (9.44s)
Expected behavior
No response
Steps to reproduce
idem
Additional information
No response
Hi David, would you please attach the file RAGFlow failed to parse?
Possibly, it fails to fetch the downloaded file. What about redoing parsing again?
Hi David, would you please attach the file RAGFlow failed to parse?
oh I had loaded it on a DL but forgot to add the link to the document. You can download it here : https://filesender.renater.fr/?s=download&token=be91b215-c855-410b-8483-7cd25bb3c11b
Kind regards, David.
PS: I've carried out several tests with different embedding and LLM recovery model configurations, assigning scores from 0 to 5 for the quality of the responses. I'll provide feedback on the question thread.
I also have the same problem
please try solve this problem.
Could you share us with the file? [email protected]