langchain
langchain copied to clipboard
Update confluence.py to return spaces between elements
Update confluence.py to return spaces between elements like headers and links.
Please see https://stackoverflow.com/questions/48913975/how-to-return-nicely-formatted-text-in-beautifulsoup4-when-html-text-is-across-m
Given:
<address>
183 Main St<br>East Copper<br>Massachusetts<br>U S A<br>
MA 01516-113
</address>
The document loader currently returns:
'183 Main StEast CopperMassachusettsU S A MA 01516-113'
After this change, the document loader will return:
183 Main St East Copper Massachusetts U S A MA 01516-113
@eyurtsev would you prefer this to be an option that can be passed in?
I have applied the black formatting in a second commit.
This looks good as is