langchain
langchain copied to clipboard
Several confluence loader improvements
This PR addresses several improvements:
- Previously it was not possible to load spaces of more than 100 pages. The
limitwas being used both as an overall page limit and as a per request pagination limit. This, in combination with the fact that atlassian seem to use a server-side hard limit of 100 when page content is expanded, meant it wasn't possible to download >100 pages. Nowlimitis used only as a per-request pagination limit andmax_pagesis introduced as the way to limit the total number of pages returned by the paginator. - Document metadata now includes
source(the source url), making it compatible withRetrievalQAWithSourcesChain. - It is now possible to include inline and footer comments.
- It is now possible to pass
verify_ssl=Falseand other parameters to the confluence object for use cases that require it.
thanks @mrharris!
Awesome MR - I actually just mentioned this issue here at the bottom, a couple minutes ago!
https://github.com/hwchase17/langchain/issues/2473