gotenberg icon indicating copy to clipboard operation
gotenberg copied to clipboard

MSG (Outlook Emails) to PDF

Open callumgarven opened this issue 1 year ago • 7 comments

Hi,

We have a repeated requirement with a lot of our clients to generate previews of MSG emails - the general use case is, they save their emails from Microsoft Outlook as MSG, and upload it to their systems. This is a propertiary Microsoft format, but many recent advancements in the https://github.com/TeamMsgExtractor/msg-extractor project look to make understanding and converting these files very simple.

Internally, this msg-extractor utility uses wkhtmltopdf, an abandoned HTML > PDF library - when the --pdf argument is specified - it isn't very good at all and has serious security flaws, alongside handling fonts badly (though this is probably fixable with the --wk-options argument?).

However, specifying the --html and --prepared-html options with the utility generate a folder of the same name of the MSG file provided to ingest, containing all image attachments, and an HTML file with relative links to the image files - and it looks great in Chrome. This would be absolutely perfect to pass to Gotenberg.

Setup:

  • Download latest release of TeamMsgExtractor/msg-extractor
  • Use w/ python -m extract_msg './test.msg' --html --prepared-html

I have tested this on some extremely large email chains, and it seems very reliable considering the nature of the format, and the project seems very active - though even if it wasn't, these files hardly have changed much and I'd argue there is a significant case to still render them.

To be clear, this isn't about converting .eml files, these are much simpler - this is about the propertiary Outlook Message Item format - the README of the extractor project contains much documentation about the format.

Happy to provide MSG files on request privately if you can't get access to these easily.

Would be interested to hear if anyone else has conquered this type of feature in their web applications, to preview these awkward file types? Sadly changing the system to force users to convert to PDF first isn't suitable for most of our clients as they have tens of years of MSG files in their system.

callumgarven avatar Oct 15 '24 19:10 callumgarven

Hey @callumgarven 👋

To summarize, the idea is to have a new route /forms/chromium/convert/msg that takes a MSG file, and internally uses TeamMsgExtractor/msg-extractor to create an HTML file that then will be converted to PDF with Chromium.

Is that correct? 😄

gulien avatar Oct 16 '24 06:10 gulien

Hi @gulien ,

Yes, this is the general premise. Input MSG, output PDF - on a chromium route.

In regards to other relevant parameters for implementation, the only other one of note appears to be: -s, --stdin Read file from stdin (only works with one file at a time). - which could affect implementation of passing MSG to the utility.

Doesn't look like there is anyway to pass the output of the HTML to stdout - so this may have to just go to a temporary file, folder specified with: --out OUTPATH Set the folder to use for the program output. (Default: Current directory) Filename specified with: --out-name OUTNAME Name to be used with saving the file output. Cannot be used if you are saving more than one file.

I'm not sure if we have interest in converting more than one MSG at once using the utility itself - as then we lose control if one MSG fails, we can't get the ones that were successful to PDF without another call - for this, probably better if it's ran multiple times would be my guess.

Cheers.

callumgarven avatar Oct 16 '24 09:10 callumgarven

Thanks for the feedback @callumgarven!

Currently, the Chromium module outputs one PDF file for each route, and having the possibility to output many files in a zip or a merge option would complexify the existing code too much IMO 👍

That being said, I've already done it in the LibreOffice module, so why not 🤷‍♂️

I'm not sure if we have interest in converting more than one MSG at once using the utility itself - as then we lose control if one MSG fails, we can't get the ones that were successful to PDF without another call - for this, probably better if it's ran multiple times would be my guess.

Yep, however one could always send one MSG file at a time 😄

gulien avatar Oct 16 '24 09:10 gulien

Related feature #846.

gulien avatar Oct 16 '24 09:10 gulien

That being said, I've already done it in the LibreOffice module, so why not 🤷‍♂️

This particular feature has been greatly enjoyed by us and your work is more than appreciated; particuarly in cases where Gotenberg is communicated with synchronously to achieve a single request, the less requests the better.

callumgarven avatar Oct 16 '24 10:10 callumgarven

Are there any plans to add this feature support to gotenberg in near future?

nipungoel25 avatar Jan 06 '25 13:01 nipungoel25

UP ❤

And eml file ?

benoit-waldmann avatar Mar 14 '25 15:03 benoit-waldmann