langchain icon indicating copy to clipboard operation
langchain copied to clipboard

PDF Loader clipping ending of document

Open gweinz opened this issue 1 year ago • 1 comments

I am trying to test load certain text based PDFs, and for some single page documents the data loader is not catching the tail-end of the PDF. Any suggestions on debugging this?

gweinz avatar Feb 09 '23 00:02 gweinz

do you have an example?

we also added a pdf loader (https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/pdf.html#using-pypdf) - does this one work?

hwchase17 avatar Feb 11 '23 07:02 hwchase17

Hi, @gweinz! I'm here to help the LangChain team manage their backlog and I wanted to let you know that we are marking this issue as stale.

From what I understand, you reported an issue with the PDF Loader in the repository where it is not capturing the end of certain single-page PDFs, leading to data loss. You mentioned that you were seeking advice on how to debug this issue. User hwchase17 has suggested using a different PDF loader and asked for an example to reproduce the problem.

Before we proceed, could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

dosubot[bot] avatar Aug 19 '23 16:08 dosubot[bot]