Outlook converter reads incorrect fields from Outlook 365 (April 2025) files, misses HTML content - Fix included
A MSG file from Outlook 356 is not processed correctly. Fix attached (not production ready, please see if e.g. want to use Markdownify library?).
TO REPRODUCE I'm passing a msg file saved from the browser version of Outlook 365. Unfortunately I can't share the file as I got it from another person, and I don't have direct access to O365 myself, but what failed was:
- It took from field from a field which has organizational info, not email address (From field was something like "/O=EXCHANGELABS/OU=EXCHANGE ADMINISTRATIVE GROUP..."
- It misses messages with HTML-only body; that is, ## Content is empty if the message is an html-only message
ELABORATION ON VERSIONS / TESTING I'm not sure if the existing implementation works with some older/other MSG files? However, attached a version of the implementation which works with messages exported from Outlook O365 browser version as of today. Unfortunately I don't have access to Outlook myself, so I'm unable to verify if this new version works with other versions, or test this thoroughly.
THE FIX Attached an Outlook converter source code file with the following changes:
- take From field correctly
- in case the text-only body is missing, try to find the html body -- it seems that at least with my python3.12, decode with UTF-16 succeeds without throwing even if the payload is actually iso-8859-1 (just output is then malformed), so we're attempting several different decodings, and see if the output looks like html -- finally, also run the html output through another library, markdownify, to turn HTML into markdown (I did not find html converter within markitdown, so thus using a 3rd party library)
While creating the fix, I found this web page helpful in finding out the correct streams; I did check also Microsoft's own documentation that I found, but it did not match what I had in the MSG file. :) https://www.devhut.net/retrieving-email-header-information-in-outlook-using-vba-part-2/
ATTACHED FIX _outlook_msg_converter_py.txt