langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Loading pdf files from directory gives the following error

Open Kashif-Raza6 opened this issue 1 year ago • 1 comments

System Info

0.0.160

Who can help?

No response

Information

  • [ ] The official example notebooks/scripts
  • [ ] My own modified scripts

Related Components

  • [ ] LLMs/Chat Models
  • [ ] Embedding Models
  • [ ] Prompts / Prompt Templates / Prompt Selectors
  • [ ] Output Parsers
  • [X] Document Loaders
  • [ ] Vector Stores / Retrievers
  • [ ] Memory
  • [ ] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [ ] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction

from langchain.document_loaders import DirectoryLoader loader = DirectoryLoader('data', glob="**/*.pdf") docs = loader.load() len(docs) error: cannot import name 'open_filename' from 'pdfminer.utils'

Expected behavior

load the pdf files from directory

Kashif-Raza6 avatar May 06 '23 07:05 Kashif-Raza6

I also faced the same issue and for now I bumped down unstructured to 0.6.1

rajib76 avatar May 07 '23 17:05 rajib76

Finally I was able to solve this issue. Pdfminer is last updated in 2019. It work on python 3.7 , 3.8 and 3.9. When I run switch from 3.10 to 3.9 it start working.

Kashif-Raza6 avatar May 07 '23 20:05 Kashif-Raza6