python-o365
                                
                                
                                
                                    python-o365 copied to clipboard
                            
                            
                            
                        Better linebreak parsing with msg.get_body_text()
I regularly extract the text of an html message. The current parsing method (below) fails to insert linebreaks where one would expect them. Is it possible to improve this? I could do this directly in lxml (with itertext), but it might be a good enhancement for the library as a whole.
def get_body_text(self):
    """ Parse the body html and returns the body text using bs4
    :return: body as text
    :rtype: str
    """
    if self.body_type.upper() != 'HTML':
        return self.body
    try:
        soup = bs(self.body, 'html.parser')
    except RuntimeError:
        return self.body
    else:
        return soup.body.text