langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Read the Docs document loader documentation example raises warning

Open mullinmax opened this issue 2 years ago • 0 comments

The example in the documentation raises a GuessedAtParserWarning

To replicate:

#!wget -r -A.html -P rtdocs https://langchain.readthedocs.io/en/latest/
from langchain.document_loaders import ReadTheDocsLoader
loader = ReadTheDocsLoader("rtdocs")
docs = loader.load()
/config/miniconda3/envs/warn_test/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py:30: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 30 of the file /config/miniconda3/envs/warn_test/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.

  _ = BeautifulSoup(

Adding the argument features can resolve this issue

#!wget -r -A.html -P rtdocs https://langchain.readthedocs.io/en/latest/
from langchain.document_loaders import ReadTheDocsLoader
loader = ReadTheDocsLoader("rtdocs", features='html.parser')
docs = loader.load()

mullinmax avatar Apr 23 '23 15:04 mullinmax