imap-backup
imap-backup copied to clipboard
Import support from a local storage format (mbox, maildir, Thunderbird .sbd, ...)
Hi @joeyates and thanks for your awesome tool which provides my valueable services to backup my data.
I've the situation where I gathered around 25 years of email in my local Thunderbird archive. Now I want to copy them (including hundreds of folders) onto an IMAP server for online accessibility.
Any hints or maybe related tool to achieve this? Or related tool?
I just tried import-export-tools-ng in Thunderbird and was expecting an import to IMAP option. But import-export-tools-ng only supports import into local folders. And according to my understanding, imap-backup only offers export to local Thunderbird Archive.
Any hint which mail army knife might help me in closing the gap?
Hi @bentolor
That's an interesting one! If the thunderbird gem had a mailbox message iterator, the rest would just be a bit of glue and deciding on the import and export paths :)
...I'll have a look
Thanks @joeyates for your quick feedback and help.
Meanwhile I was able to spot the little Python-Script https://github.com/rgladwell/imap-upload/ which, after some fiddling, allowed me to upload a local MBOX export. So my immediate problem has been solved and now I realize the challenges of having a self-hosted, web/mobile full-text searchable mail archive.
I still think that for symmetry a imap-backup utils import-from-thunderbird FOLDER would be a great addition.
On the same lines was also missing a imap-backup remote accounts command lately ;-).
With Thunderbird, it's not sufficient to read the mailbox file itself to get the messages.
This is for two reasons.
Firstly, the mailbox may contain messages that have been deleted.
Secondly, there is an edge case regarding plain-text emails in finding message boundaries. The following is a note explaining this second problem.
Each message starts with a 'From' line e.g.:
From - Sun Jan 14 09:39:37 2024
To find the following message, it is not sufficient to search for lines with that format as lines in the email body itself may match.
If the email message is multipart (text+html) the boundary markers can be used to skip past the body, e.g.:
From - Sun Jan 14 09:36:08 2024
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
MIME-Version: 1.0
Date: Sun, 14 Jan 2024 09:35:57 +0100
Message-ID: <CAD0bxQFFegRpHErN1rAKEp1tqwxVBhb=e2UcoBRurSSYMF+Bew@mail.gmail.com>
Subject: Blah
From: Me <[email protected]>
To: You <[email protected]>
Content-Type: multipart/alternative; boundary="00000000000043120b060ee3c968"
--00000000000043120b060ee3c968
Content-Type: text/plain; charset="UTF-8"
From - Sun Jan 14 09:34:15 2024
The previous line is part of the email body!
--00000000000043120b060ee3c968
Content-Type: text/html; charset="UTF-8"
<div dir="ltr"><div>From - Sun Jan 14 09:34:15 2024</div><div>The previous line is part of the email body!</div><div><br></div></div>
--00000000000043120b060ee3c968--
This is not possible for text-only emails, which don't have boundary markers, e.g.:
From - Sun Jan 14 09:39:37 2024
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
MIME-Version: 1.0
Date: Sun, 14 Jan 2024 09:39:34 +0100
Message-ID: <CAD0bxQEb=3gAWk6Gys3FUhOURsTOns9sREQhVJVrD-Quq=gTQg@mail.gmail.com>
Subject: Blah
From: Me <[email protected]>
To: You <[email protected]>
Content-Type: text/plain; charset="UTF-8"
From - Sun Jan 14 09:34:15 2024
The previous line is part of the email body!
So, I believe that, to correctly identify the message boundaries in Thunderbird mailboxes, it is necessary to parse the associated *.msf index file.
These files contain indicators for the position and length of current messages in the mail box (msgOffset, offlineMsgSize).
Unfortunately, Thunderbird still uses the dreaded Mork file format for these files.
I'll leave this open in the hope that an easier solution comes to light. Otherwise, I may just write a Mork parser!
Thanks for your research and friendly feedback!
Mork being called out on Wikipedia as
He has lambasted the ostensibly "textual" format on the grounds that it is "not human-readable",[3] bemoaned the impossibility of writing a correct parser for the format,[4] and referred to it as "...the single most braindamaged file format that I have ever seen in my nineteen year career".[4]
I'm not sure If I'd recommend to write a Mork parser for the sake of sanity ;-)
I understood (and handled) the .msf files as throwaway-files, especially when my fulltext index got corrupted. But I also do have a few corrupted emails where i'm not aware of the source of corruption.
Reliably storing emails – how hard can it be?!?
.mbox file format familiy: Hold my beer!
I've added a contrib script with an example of import from Thunderbird.