got-your-back icon indicating copy to clipboard operation
got-your-back copied to clipboard

bugfix: stop ignoring first line of imported email

Open skrobul opened this issue 1 year ago • 1 comments

I have stumbled upon a bug where almost all of the restored emails were corrupted. The emails in question seemed to have almost the same text but the formatting was all over the place and some of the words were mangled badly. HTML tables were broken.

Upon investigation I noticed that actual body of the email in the original .eml file and one downloaded from Googles "Download message" was practically identical with exception of few headers. One of those headers was Content-Transfer-Encoding which happened to be very first line of each corrupted email.

Example diff:

$ diff docker_meetup_email_after_gyb_restore.eml original_docker_email.eml
1,11c1
< Authentication-Results: mx.google.com;
<        dkim=neutral (body hash did not verify) [email protected] header.s=s1 header.b="Ivpq/sFe"
< X-Google-Smtp-Source: AGHT+IGEy4ty3doaMTjFqiOkSsSCpk9NLEy/NCs28XMnDJUGPy5CZ54yo2foi5usb9P4cI1hNo4Fqzyh56Lj5OK1xw==
< Received: from 777146845227
<       named unknown
<       by gmailapi.google.com
<       with HTTPREST;
<       Fri, 1 Nov 2024 20:03:11 +0000
---
> Content-Transfer-Encoding: quoted-printable
46a37
> X-Google-Smtp-Source: AAOMgpen7PSnhPReh9WOrpUPOxq9IhkBBjd6pokoxWeGNf9xtEIQtwHrvjIF7wax5u3067qhdJYI
$

After looking into the source code of fmbox.py I noticed that constructor of class fmbox() advances the self._file when initialising the _last_from_line but does not rewind it back which effectively produces a message that is stripped of first line. Presumably this is not a problem when a message starts with a From header but it is when it's anything else.

At this point I am not sure if this is provider specific or what, but for some context, my .eml files have been created by Proton Mail export tool. The same emails were imported from Google Takeout to Proton few years earlier if that matters.

I have tested the fix by importing about 500 messages and they all display correctly.

This is also likely related to the problem @infovations has seen in https://github.com/GAM-team/got-your-back/issues/148 as well as #157

skrobul avatar Nov 01 '24 21:11 skrobul

Hmmm...

So this is a difference between mbox files where the first line is the From delimiter (see https://en.wikipedia.org/wiki/Mbox) and .eml files where the first line is an email header.

The proper fix here would be to examine that first line and if it's actually "From " (notice no :) then remove that first line.

jay0lee avatar Nov 14 '24 14:11 jay0lee