python-o365 icon indicating copy to clipboard operation
python-o365 copied to clipboard

Better linebreak parsing with msg.get_body_text()

Open johanovic opened this issue 5 years ago • 2 comments

I regularly extract the text of an html message. The current parsing method (below) fails to insert linebreaks where one would expect them. Is it possible to improve this? I could do this directly in lxml (with itertext), but it might be a good enhancement for the library as a whole.

def get_body_text(self):
    """ Parse the body html and returns the body text using bs4

    :return: body as text
    :rtype: str
    """
    if self.body_type.upper() != 'HTML':
        return self.body

    try:
        soup = bs(self.body, 'html.parser')
    except RuntimeError:
        return self.body
    else:
        return soup.body.text

johanovic avatar Jul 15 '20 09:07 johanovic