langchain Loading pdf files from directory gives the following error

Loading pdf files from directory gives the following error

Open Kashif-Raza6 opened this issue 1 year ago • 1 comments

System Info

0.0.160

Who can help?

No response

Information

[ ] The official example notebooks/scripts
[ ] My own modified scripts

Related Components

[ ] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[X] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

from langchain.document_loaders import DirectoryLoader loader = DirectoryLoader('data', glob="**/*.pdf") docs = loader.load() len(docs) error: cannot import name 'open_filename' from 'pdfminer.utils'

Expected behavior

load the pdf files from directory

May 06 '23 07:05 Kashif-Raza6

I also faced the same issue and for now I bumped down unstructured to 0.6.1

May 07 '23 17:05 rajib76

Finally I was able to solve this issue. Pdfminer is last updated in 2019. It work on python 3.7 , 3.8 and 3.9. When I run switch from 3.10 to 3.9 it start working.

May 07 '23 20:05 Kashif-Raza6

langchain langchain copied to clipboard

Loading pdf files from directory gives the following error

System Info

Who can help?

Information

Related Components

Reproduction

Expected behavior

langchain
langchain copied to clipboard