langchain
langchain copied to clipboard
Read the Docs document loader documentation example raises warning
The example in the documentation raises a GuessedAtParserWarning
To replicate:
#!wget -r -A.html -P rtdocs https://langchain.readthedocs.io/en/latest/
from langchain.document_loaders import ReadTheDocsLoader
loader = ReadTheDocsLoader("rtdocs")
docs = loader.load()
/config/miniconda3/envs/warn_test/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py:30: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 30 of the file /config/miniconda3/envs/warn_test/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.
_ = BeautifulSoup(
Adding the argument features can resolve this issue
#!wget -r -A.html -P rtdocs https://langchain.readthedocs.io/en/latest/
from langchain.document_loaders import ReadTheDocsLoader
loader = ReadTheDocsLoader("rtdocs", features='html.parser')
docs = loader.load()