chat-langchain
chat-langchain copied to clipboard
ingest, embedding, faiss error
$ ./ingest.sh --2023-07-29 15:07:25-- https://langchain.readthedocs.io/en/latest/ Resolving langchain.readthedocs.io (langchain.readthedocs.io)... 104.17.32.82, 104.17.33.82, 2606:4700::6811:2152, ... Connecting to langchain.readthedocs.io (langchain.readthedocs.io)|104.17.32.82|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://api.python.langchain.com/en/latest/ [following] --2023-07-29 15:07:25-- https://api.python.langchain.com/en/latest/ Resolving api.python.langchain.com (api.python.langchain.com)... 104.17.32.82, 104.17.33.82, 2606:4700::6811:2152, ... Connecting to api.python.langchain.com (api.python.langchain.com)|104.17.32.82|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘langchain.readthedocs.io/en/latest/index.html’
langchain.readthedocs.io/en/latest/in [ <=> ] 1.57K --.-KB/s in 0s
2023-07-29 15:07:25 (30.1 MB/s) - ‘langchain.readthedocs.io/en/latest/index.html’ saved [1612]
FINISHED --2023-07-29 15:07:25-- Total wall clock time: 0.3s Downloaded: 1 files, 1.6K in 0s (30.1 MB/s) /home/lichen/environments/langchain_documentation_chatbot/venv/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py:48: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 48 of the file /home/lichen/environments/langchain_documentation_chatbot/venv/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.
_ = BeautifulSoup( /home/lichen/environments/langchain_documentation_chatbot/venv/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py:75: GuessedAtParserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
The code that caused this warning is on line 75 of the file /home/lichen/environments/langchain_documentation_chatbot/venv/lib/python3.8/site-packages/langchain/document_loaders/readthedocs.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.
soup = BeautifulSoup(data, **self.bs_kwargs)
Embeddings: client=<class 'openai.api_resources.embedding.Embedding'> model='text-embedding-ada-002' deployment='text-embedding-ada-002' openai_api_version='' openai_api_base='' openai_api_type='' openai_proxy='' embedding_ctx_length=8191 openai_api_key='...apikeyhere...' openai_organization='' allowed_special=set() disallowed_special='all' chunk_size=1000 max_retries=6 request_timeout=None headers=None tiktoken_model_name=None show_progress_bar=False model_kwargs={}
Traceback (most recent call last):
File "ingest.py", line 34, in
+1
+1
me too
me three
the website is null https://langchain.readthedocs.io/en/latest/