python-o365
python-o365 copied to clipboard
get_body_text() returning different body text format since 20th May
I am using , the get_body_text() for getting the body of the email , once I get the body, I am using regex for parsing the email and extracting Dates and some few other entities.
Since 20th-May , I am observing that my regex are failing and when I saw the body if the email which I get from get_body_text(), the format had changed (The email format is same when viewed visually in Outlook, but the parsed body from the method changed).
I am attaching a file below which shows the before and after samples. Before-After.txt
From the text file you can see that the newlines which were present before and now gone.
Versions are below:
- beautifulsoup4==4.8.2
- o365==2.0.6
Nothing has changed in the method get_body_text.
I don't know what maybe causing this
I have the same issue. Everything is joined together..
I would suggest a few debug prints maybe before and after the calls bs4 to the _get_body_text()... There could be several reasons for this change... but unless you have modified the local repo... the function is the repo is still the same.
https://github.com/O365/python-o365/blob/2d094c745692492f12d8ca90896a28e007323f20/O365/message.py#L1011
^ you can add in a few prints in that function to see the process before and after the bs4 parser. Also if you have the message in outlook. You can confirm if what you are seeing is the same Before_During_After.