chat-langchain
chat-langchain copied to clipboard
fixes: "langchain.readthedocs.io" -> "python.langchain.com", else it only downloads a single index.html
The current ingest wget command only downloads a single index.html file, I noticed that "https://langchain.readthedocs.io/en/latest/" redirects to "https://python.langchain.com/en/latest/" and when I change the script to use the second url it downloads correctly everything recursively. Is the wget command used wrongly, or perhaps did the documentation link change and the script is outdated?
Anyways now it scrapes the docs correctly. Extra: +ingest.bat for us windows scrubs.
This also doesn't load all the data properly any more. Anyone knows how to scrape the docs properly?