langchain
langchain copied to clipboard
PDF Loader clipping ending of document
I am trying to test load certain text based PDFs, and for some single page documents the data loader is not catching the tail-end of the PDF. Any suggestions on debugging this?
do you have an example?
we also added a pdf loader (https://langchain.readthedocs.io/en/latest/modules/document_loaders/examples/pdf.html#using-pypdf) - does this one work?
Hi, @gweinz! I'm here to help the LangChain team manage their backlog and I wanted to let you know that we are marking this issue as stale.
From what I understand, you reported an issue with the PDF Loader in the repository where it is not capturing the end of certain single-page PDFs, leading to data loss. You mentioned that you were seeking advice on how to debug this issue. User hwchase17 has suggested using a different PDF loader and asked for an example to reproduce the problem.
Before we proceed, could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!