langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Several confluence loader improvements

Open mrharris opened this issue 2 years ago • 1 comments

This PR addresses several improvements:

  • Previously it was not possible to load spaces of more than 100 pages. The limit was being used both as an overall page limit and as a per request pagination limit. This, in combination with the fact that atlassian seem to use a server-side hard limit of 100 when page content is expanded, meant it wasn't possible to download >100 pages. Now limit is used only as a per-request pagination limit and max_pages is introduced as the way to limit the total number of pages returned by the paginator.
  • Document metadata now includes source (the source url), making it compatible with RetrievalQAWithSourcesChain.
  • It is now possible to include inline and footer comments.
  • It is now possible to pass verify_ssl=False and other parameters to the confluence object for use cases that require it.

mrharris avatar Apr 21 '23 13:04 mrharris

thanks @mrharris!

dev2049 avatar Apr 21 '23 16:04 dev2049

Awesome MR - I actually just mentioned this issue here at the bottom, a couple minutes ago!

https://github.com/hwchase17/langchain/issues/2473

theauheral avatar Apr 23 '23 22:04 theauheral