langchain
langchain copied to clipboard
Loading pdf files from directory gives the following error
System Info
0.0.160
Who can help?
No response
Information
- [ ] The official example notebooks/scripts
- [ ] My own modified scripts
Related Components
- [ ] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [X] Document Loaders
- [ ] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
from langchain.document_loaders import DirectoryLoader loader = DirectoryLoader('data', glob="**/*.pdf") docs = loader.load() len(docs)
error:
cannot import name 'open_filename' from 'pdfminer.utils'
Expected behavior
load the pdf files from directory
I also faced the same issue and for now I bumped down unstructured to 0.6.1
Finally I was able to solve this issue. Pdfminer is last updated in 2019. It work on python 3.7 , 3.8 and 3.9. When I run switch from 3.10 to 3.9 it start working.