email-outlook-message-perl icon indicating copy to clipboard operation
email-outlook-message-perl copied to clipboard

msgconvert: keep HTML variants of the email (skips multipart/mixed properties)

Open pabs3 opened this issue 9 years ago • 10 comments

Forwarding https://bugs.debian.org/801189

Version: 0.918-1 File: /usr/bin/msgconvert

I attempted to convert a mail containing plain text and HTML variants but msgconvert only kept the plain text variant, discarding the HTML variant. It would be nice if it could keep both of them.

pabs@chianamo ~ $ msgconvert --verbose path/to/outlook.msg 
Skipping DIR entry __nameid_version1 0 (Introductory stuff)
...
Skipping property 001F:8004 (UNKNOWN): multipart/mixed; boundary="_009_3C5F9D52E ...
...
Using    property 001F:1000 (BODY_PLAIN): ...
...

pabs3 avatar Apr 24 '16 03:04 pabs3

@pabs3 thanks for your bug report. To implement this, it would be very helpful to have an example file available. Do you have one that you can share with me?

mvz avatar Apr 24 '16 05:04 mvz

Unfortunately the .msg I have cannot be shared publicly and I do not have access to Outlook in order to generate such a message. In case you have access to outlook at can convert an mbox to .msg format, I have attached a sample mbox that should match the .msg I found.

bye, pabs

http://bonedaddy.net/pabs3/

pabs3 avatar Apr 24 '16 10:04 pabs3

Github doesn't seem to support attaching files by email, hopefully it does without JavaScript.

pabs3 avatar Apr 24 '16 11:04 pabs3

Sigh, seems to need JavaScript and doesn't support mbox files. Uploaded:

test.mbox.zip

pabs3 avatar Apr 24 '16 11:04 pabs3

@pabs3 thanks, I'll see what I can do.

mvz avatar Apr 24 '16 11:04 mvz

I was also looking for this. Emails can have text/rtf, text/plain, and text/html versions.

jpadilla avatar Jun 16 '16 13:06 jpadilla

Ping, any update on this one?

thctlo avatar May 23 '17 08:05 thctlo

According to the log, the property that stores the multipart/mixed part has ID '8004', which is in the range reserved for user-defined named properties. It's surprising that there isn't also a property containing just the text/html part (ID '1013')

To be able to handle this different property, Email::Outlook::Message needs to support named properties.

I'm afraid I will also need to have some sample .msg file, since the logging doesn't currently include enough information to find the full name for the user-defined named property. Alternatively, the output of oledump when run on the msg file may be enough.

mvz avatar Aug 30 '20 15:08 mvz

I found a test file in another github repo which hopefully is suitable:

https://github.com/hrbrmstr/msgxtractr/blob/master/inst/extdata/unicode.msg

For this one perl -Ilib script/msgconvert --verbose of current git master says:

Skipping property 001F:8003 (UNKNOWN): multipart/mixed; boundary="001a113392ecbd ...

ojwb avatar Aug 30 '20 22:08 ojwb

I've looked at the example that @ojwb found and property 001F:8003 is just the content-type and does not contain the full message. That message contains bodies in plain text and RTF format, and the RTF part is RTF-encapsulated HTML. There's already issue #6 about that.

Additionally, I noticed that having RTF as one part of a multipart/alternative content makes it be completely invisible at least to my email reader (Thunderbird).

So, two things need to happen:

  • Render RTF parts as real attachments
  • Convert RTF-encapsulated HTML to HTML and use that instead

mvz avatar Sep 01 '20 07:09 mvz