langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Loading Multiple PDF error

Open Kashif-Raza6 opened this issue 1 year ago • 11 comments

I am using Directory Loader to load my all the pdf in my data folder. from langchain.document_loaders import DirectoryLoader loader = DirectoryLoader("data", glob = "**/*.pdf") documents = loader.load() print(documents) This throw error while when I load txt files this is working fine.

Kashif-Raza6 avatar Apr 14 '23 01:04 Kashif-Raza6

What's the error?

digitake avatar Apr 14 '23 04:04 digitake

Error is in documents = loader.load() ImportError: cannot import name 'is_directory' from 'PIL._util' (/usr/local/lib/python3.9/dist-packages/PIL/_util.py)

Kashif-Raza6 avatar Apr 14 '23 10:04 Kashif-Raza6

Try restart runtime after install the Pillow package. Let me know if that helps.

digitake avatar Apr 14 '23 10:04 digitake

I have restarted the run time as well as installed the pillow package again, but this even did not solve the problem.

Kashif-Raza6 avatar Apr 14 '23 10:04 Kashif-Raza6

It seems like a classic case it's working on my machine.

I have tried your code and it works fine.

image

I'd suggest you to check your Langchain and Pillow version. You can fresh install in your new virtualenv.

digitake avatar Apr 14 '23 11:04 digitake

Did you try with multiple PDF files in the directory? For a single PDF file it's working on my end as well, but with multiple PDFs it's creating the error.

Kashif-Raza6 avatar Apr 14 '23 11:04 Kashif-Raza6

Yes. I have three files. image

digitake avatar Apr 14 '23 12:04 digitake

When I install the chromadb in virtual env am getting following error:

error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/ Even I have installed the Microdoft C++ Build Tools in my system.

how can I resolve it?

Kashif-Raza6 avatar Apr 15 '23 08:04 Kashif-Raza6

I had this problem yesterday. It's the Pillow version mismatch. Try a different version which worked for me. pip uninstall Pillow pip install Pillow==9.1.0

nhtkid avatar Apr 15 '23 09:04 nhtkid

I'm getting below error:- PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

grv805 avatar May 10 '23 11:05 grv805

I'm running it on GCP Vertex AI Workbench. How to overcome it?

grv805 avatar May 10 '23 11:05 grv805

Cuando instalo chromadben virtual env obtengo el siguiente error:

error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/ Incluso he instalado Microdoft C++ Build Tools en mi sistema.

¿Cómo puedo resolverlo? Try this solution, it worked for me. https://stackoverflow.com/questions/73969269/error-could-not-build-wheels-for-hnswlib-which-is-required-to-install-pyprojec

sistecno avatar May 28 '23 22:05 sistecno

Hi, @Kashif-Raza6! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you reported an issue related to the DirectoryLoader in LangChain. The error you encountered is ImportError: cannot import name 'is_directory' from 'PIL._util'. Some users have suggested restarting the runtime and reinstalling the Pillow package, but it seems that the issue still persists. Additionally, there is another issue reported by grv805 regarding a PDFInfoNotInstalledError when running the code on GCP Vertex AI Workbench.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation. We appreciate your contribution to the LangChain project!

dosubot[bot] avatar Sep 21 '23 16:09 dosubot[bot]