email-outlook-message-perl
email-outlook-message-perl copied to clipboard
msgconvert: keep HTML variants of the email (skips multipart/mixed properties)
Forwarding https://bugs.debian.org/801189
Version: 0.918-1 File: /usr/bin/msgconvert
I attempted to convert a mail containing plain text and HTML variants but msgconvert only kept the plain text variant, discarding the HTML variant. It would be nice if it could keep both of them.
pabs@chianamo ~ $ msgconvert --verbose path/to/outlook.msg
Skipping DIR entry __nameid_version1 0 (Introductory stuff)
...
Skipping property 001F:8004 (UNKNOWN): multipart/mixed; boundary="_009_3C5F9D52E ...
...
Using property 001F:1000 (BODY_PLAIN): ...
...
@pabs3 thanks for your bug report. To implement this, it would be very helpful to have an example file available. Do you have one that you can share with me?
Unfortunately the .msg I have cannot be shared publicly and I do not have access to Outlook in order to generate such a message. In case you have access to outlook at can convert an mbox to .msg format, I have attached a sample mbox that should match the .msg I found.
bye, pabs
http://bonedaddy.net/pabs3/
Github doesn't seem to support attaching files by email, hopefully it does without JavaScript.
@pabs3 thanks, I'll see what I can do.
I was also looking for this. Emails can have text/rtf, text/plain, and text/html versions.
Ping, any update on this one?
According to the log, the property that stores the multipart/mixed part has ID '8004', which is in the range reserved for user-defined named properties. It's surprising that there isn't also a property containing just the text/html part (ID '1013')
To be able to handle this different property, Email::Outlook::Message needs to support named properties.
I'm afraid I will also need to have some sample .msg file, since the logging doesn't currently include enough information to find the full name for the user-defined named property. Alternatively, the output of oledump when run on the msg file may be enough.
I found a test file in another github repo which hopefully is suitable:
https://github.com/hrbrmstr/msgxtractr/blob/master/inst/extdata/unicode.msg
For this one perl -Ilib script/msgconvert --verbose of current git master says:
Skipping property 001F:8003 (UNKNOWN): multipart/mixed; boundary="001a113392ecbd ...
I've looked at the example that @ojwb found and property 001F:8003 is just the content-type and does not contain the full message. That message contains bodies in plain text and RTF format, and the RTF part is RTF-encapsulated HTML. There's already issue #6 about that.
Additionally, I noticed that having RTF as one part of a multipart/alternative content makes it be completely invisible at least to my email reader (Thunderbird).
So, two things need to happen:
- Render RTF parts as real attachments
- Convert RTF-encapsulated HTML to HTML and use that instead