langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Errors with DirectoryLoader

Open tanayvarshney opened this issue 1 year ago • 1 comments

from langchain.document_loaders import DirectoryLoader
loader = DirectoryLoader('./server', glob="**/*.md")
data = loader.load()

Error

    from pdfminer.utils import open_filename
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (/usr/local/lib/python3.8/dist-packages/pdfminer/utils.py)

Langchain version: '0.0.152'

tanayvarshney avatar Apr 28 '23 18:04 tanayvarshney

Ran into the same issue this morning. I had to downgrade unstructured to version 0.6.1.

ysato avatar Apr 28 '23 22:04 ysato

from langchain.document_loaders import DirectoryLoader
loader = DirectoryLoader('./server', glob="**/*.md")
data = loader.load()

Error

    from pdfminer.utils import open_filename
ImportError: cannot import name 'open_filename' from 'pdfminer.utils' (/usr/local/lib/python3.8/dist-packages/pdfminer/utils.py)

Langchain version: '0.0.152'

I have the same problem with loading certain pdfs. Is there a way to turn on a trace/debug option when the loader is running so I can see what file if fails on?

botchagalupe avatar Jul 13 '23 00:07 botchagalupe

from getpass import getpass
from langchain.document_loaders import GitHubIssuesLoader
ACCESS_TOKEN = getpass()
loader = GitHubIssuesLoader(
    repo="eminmtas/check-license",
    access_token=ACCESS_TOKEN,
    creator="eminmtas",
)

Error

from langchain.document_loaders import GitHubIssuesLoader
ImportError: cannot import name 'GitHubIssuesLoader' from 'langchain.document_loaders'

I am getting this error too. I have the required libraries on my venv. What to do?

eminmtas avatar Jul 13 '23 13:07 eminmtas

Hi, @tanayvarshney. I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue titled "Errors with DirectoryLoader" is about an import error when using the DirectoryLoader from the langchain.document_loaders module. The error seems to be related to the open_filename function from the pdfminer.utils module. This issue was encountered with version 0.0.152 of LangChain. Another user mentioned that downgrading the unstructured library to version 0.6.1 resolved the issue. There is also a user who encountered a similar error with the GitHubIssuesLoader and is seeking guidance on how to resolve it.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project. Let us know if you have any further questions or concerns.

dosubot[bot] avatar Oct 12 '23 16:10 dosubot[bot]

ImportError: cannot import name 'open_filename' from 'pdfminer.utils' Tried all the mentioned methods in the net but still getting the same issue.

advaithgit avatar Oct 16 '23 11:10 advaithgit

@baskaryan Could you please help @advaithgit with the issue titled "Errors with DirectoryLoader"? They are encountering an import error when using the DirectoryLoader from the langchain.document_loaders module. The error seems to be related to the open_filename function from the pdfminer.utils module. They have tried the mentioned methods in the net but are still facing the same issue. Thank you!

dosubot[bot] avatar Oct 16 '23 11:10 dosubot[bot]

yes, please help with this issue

Amaresh078724 avatar Nov 28 '23 15:11 Amaresh078724

# Install package

pip install "unstructured[all-docs]"

this worked for me

lakshman-11 avatar Dec 04 '23 11:12 lakshman-11

Just upgrade your pdfminer.six package. See here

mkmohangb avatar Dec 19 '23 12:12 mkmohangb