langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Update confluence.py to return spaces between elements

Open gardner opened this issue 2 years ago • 2 comments

Update confluence.py to return spaces between elements like headers and links.

Please see https://stackoverflow.com/questions/48913975/how-to-return-nicely-formatted-text-in-beautifulsoup4-when-html-text-is-across-m

Given:

<address>
        183 Main St<br>East Copper<br>Massachusetts<br>U S A<br>
        MA 01516-113
    </address>

The document loader currently returns:

'183 Main StEast CopperMassachusettsU S A        MA 01516-113'

After this change, the document loader will return:

183 Main St East Copper Massachusetts U S A MA 01516-113

@eyurtsev would you prefer this to be an option that can be passed in?

gardner avatar May 29 '23 01:05 gardner

I have applied the black formatting in a second commit.

gardner avatar May 29 '23 17:05 gardner

This looks good as is

eyurtsev avatar Jun 02 '23 02:06 eyurtsev