fix: retrieve body structure and process parts on mailbox sync
Adjusted mail retrieval logic to pull body structure without contents on partial sync so that attachments (like iMip Messages) can be found.
Part 1 of iMip Messaging Refactoring
The last time I tried, pulling structure increased the data transferred significantly.
Well you are correct... sort of... but the data transfer increase has nothing to do with adding BODYSTRUCTURE to the command... sort of... Let me explain....
I tested this with my main account that has about 12K of messages, the difference was about 200MB.
Currently, - The sync fetches all basic information for all the messages. (does this for every mailbox and stops)
With BODYSTRUCTURE, - First, the sync fetches all basic information for all the messages including BODYSTRUCTURE. - Then, this somehow triggers a followup pull on each individual message for BODY.PEEK[Mime Part Number] which pulls the message text body.
So the issue is the followup fetch and no the BODYSTRUCTURE.
I will look in to fixing this.
Fixed. The BODYSTRUCTURE no longer triggers another fetch of any body parts.
File sizes difference on 12K of messages is minimal.
Please update the pull request description what problem it does fix and how to reproduce the problem.
@kesselb
You can use the same steps as this and run a sync on a mailbox twice.
https://github.com/nextcloud/mail/pull/10046#pullrequestreview-2260232699
The issue is that the partial sync does not pull the message BODYSTRUCTURE, which contains information about attachments like iMip event attachments. This code changes the logic to pull the BODYSTRUCTURE on sync instead of in a followup background job.
You should see something like this being pulled for every message in the mail log.
(BODYSTRUCTURE (((("TEXT" "PLAIN" ("CHARSET" "ISO-8859-1") NIL NIL "QUOTED-PRINTABLE" 833 30 NIL NIL NIL)("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL "QUOTED-PRINTABLE" 3412 62 NIL ("INLINE" NIL) NIL) "ALTERNATIVE" ("BOUNDARY" "2__=fgrths") NIL NIL)("IMAGE" "GIF" ("NAME" "485039.gif") "<2__=lgkfjr>" NIL "BASE64" 64 NIL ("INLINE" ("FILENAME" "485039.gif")) NIL) "RELATED" ("BOUNDARY" "1__=fgrths") NIL NIL)("APPLICATION" "PDF" ("NAME" "title.pdf") "<1__=lgkfjr>" NIL "BASE64" 333980 NIL ("ATTACHMENT" ("FILENAME" "title.pdf")) NIL) "MIXED" ("BOUNDARY" "0__=fgrths") NIL NIL))
The purpose of this is to detect event invitations as soon as the message is seen. So that the iMip service can update the calendar events sooner.
Does this PR have a functional impact on the Mail app or is this just refactoring/preparing for another change?
I did some testing by sending invites from another Nextcloud server and they are still synced with
imip_message=0.I can send you the exact invitation email in case you want to debug this.
Hey @st3iny ,
This PR is part of a series of improvements for iMip messages, to detect and process them as soon as they arrive.
I am not sure why your tests didn't show the imip_message as 1.
I just retested the PR, I deleted all the messages in "mail_mesages" then reset the "sync_new_token" on the mailbox to blank, and let the automated UI sync kick in the results where as expected the iMip messages are detected on sync.
And the BODYSTRUCTURE is properly pulled.
Nope, still the same. I did a full cache reset (+ initial sync) and not a single email has imip_message=1 despite dozens of invitation emails in the inbox.
Nope, still the same. I did a full cache reset (+ initial sync) and not a single email has
imip_message=1despite dozens of invitation emails in the inbox.
If I'm not mistaken this could be fixed after https://github.com/nextcloud/mail/pull/11016?
Nope, still the same. I did a full cache reset (+ initial sync) and not a single email has
imip_message=1despite dozens of invitation emails in the inbox.If I'm not mistaken this could be fixed after #11016?
That PR is unrelated, the point of this PR was to retrieve the structure of the messages (without contents) during the sync phase vs doing it in a separate job an hour later.
Okay
The branch needs a rebase as it is 659 commits behind.
We merged two pull requests in the meantime that touch on the same topic. Here's a short summary of what they did:
-
https://github.com/nextcloud/mail/pull/10661: The preview enhancer background job did not flag iMIP messages as such because we were not traversing the messages properly.
ImapMessageFetcher.getParttraverses properly but might still benefit from usingpartIteratorovergetParts, but that's a different story. -
https://github.com/nextcloud/mail/pull/11016: There's another background job to process iMIP messages. It also depends on the
imip_messageflag set by the preview enhancer. The use case is that users who didn't open the email in Nextcloud Mail still get that event added to their calendar. That job wasn't running properly due to the wrong SQL condition.
If we merge this PR, the message is flagged faster/earlier as an iMIP message, and the background job can add it to the calendar earlier. If the data usage is okay, then it seems reasonable.
For now, we have to keep the preview enhancer logic because it (1) also sets the has attachment flag, preview text, iMIP, is encrypted, and mentions me, and (2) because we decided to make our check for iMIP yes/no also consider MIME parts without a method attribute (to address the issue when invitations went through ProtonMail https://github.com/nextcloud/mail/issues/11009) and therefore need to fetch the contents.
Seeing the code to detect iMIP messages here in getPart as well reminds me that it would be nice (as a follow-up, of course) to split that mail parsing/processing logic into reusable pieces. It's so easy to lose track.
I assume the situation is already much better today (as the flagging by the preview enhancer is improved and the background job actually looks at the right messages) compared to August 2024. Thus, it's a fair question whether we still want to proceed with this or keep it as is for now and monitor the situation.
For now, we have to keep the preview enhancer logic because it (1) also sets the has attachment flag, preview text, iMIP, is encrypted, and mentions me, and (2) because we decided to make our check for iMIP yes/no also consider MIME parts without a method attribute (to address the issue when invitations went through ProtonMail #11009) and therefore need to fetch the contents.
FYI, if we pull the body structure we don't need the preview enhancer all the information about every attachment and MIME part is available WITHOUT pulling the contents (this is how desktop clients do it)
All the information about the message can be retrieved, without the contents, on the initial listing of messages, istead of individually pulling each message
If it's the best way to fetch everything in one go then change it to that please @SebastianKrupinski. Just ve aware that performance regressions are off the table. This is only acceptable if it makes the app faster.
If it's the best way to fetch everything in one go then change it to that please @SebastianKrupinski. Just ve aware that performance regressions are off the table. This is only acceptable if it makes the app faster.
From my reading of the preview enhancer, all the information that it extracts can be gathered from pulling the BODYSTRUCTURE..
What the preview enhancer looks for...
What we can get with a single command from IMAP...
C: 4 UID FETCH 169:174 (BODYSTRUCTURE BODY.PEEK[HEADER])
S: * 126 FETCH (UID 169 BODYSTRUCTURE ("text" "plain" ("charset" "utf-8" "format" "flowed") NIL NIL "quoted-printable" 256 8 NIL NIL NIL NIL) BODY[HEADER] {1147}
S: [LITERAL DATA: 1147 bytes]
S: )
S: * 127 FETCH (UID 170 BODYSTRUCTURE (("text" "plain" ("charset" "us-ascii") NIL "Notification" "7bit" 771 21 NIL NIL NIL NIL)("message" "delivery-status" NIL NIL "Delivery report" "7bit" 425 NIL NIL NIL NIL)("message" "rfc822" NIL NIL "Undelivered Message" "7bit" 17847 ("Wed, 23 Apr 2025 19:32:34 -0400" "Cancelled: Live Event Test @ Sat, May 3 2025 10:00 floating" (("User 1" NIL "user1" "nextdev.app")) (("User 1" NIL "user1" "nextdev.app")) (("User 1" NIL "user1" "nextdev.app")) ((NIL NIL "user2" "nextdev.app")) NIL NIL NIL "<[email protected]>") ((("text" "plain" ("charset" "utf-8") NIL NIL "quoted-printable" 202 13 NIL NIL NIL NIL)("text" "html" ("charset" "utf-8") NIL NIL "quoted-printable" 13912 569 NIL NIL NIL NIL)("text" "calendar" ("charset" "utf-8" "method" "CANCEL") NIL NIL "quoted-printable" 876 28 NIL NIL NIL NIL) "alternative" ("boundary" "17454511547.bc8a7EDB8.10203") NIL NIL NIL)("application" "ics" ("charset" "utf-8") "<[email protected]>" NIL "base64" 1158 NIL ("attachment" ("filename" "invite.ics")) NIL NIL) "mixed" ("boundary" "17454511548.aAe42aC3F.10203") NIL NIL NIL) 679 NIL NIL NIL NIL) "report" ("report-type" "delivery-status" "boundary" "313C22CC006B.1745451154/mailuser.phl.internal") NIL NIL NIL) BODY[HEADER] {1575}
S: [LITERAL DATA: 1575 bytes]
S: )
S: * 128 FETCH (UID 171 BODYSTRUCTURE (("text" "plain" ("charset" "us-ascii") NIL "Notification" "7bit" 771 21 NIL NIL NIL NIL)("message" "delivery-status" NIL NIL "Delivery report" "7bit" 425 NIL NIL NIL NIL)("message" "rfc822" NIL NIL "Undelivered Message" "7bit" 17825 ("Wed, 23 Apr 2025 19:32:34 -0400" "Cancelled: Live Event Test @ Sat, May 3 2025 10:00 floating" (("User 1" NIL "user1" "nextdev.app")) (("User 1" NIL "user1" "nextdev.app")) (("User 1" NIL "user1" "nextdev.app")) ((NIL NIL "user1" "nextdev.app")) NIL NIL NIL "<[email protected]>") ((("text" "plain" ("charset" "utf-8") NIL NIL "quoted-printable" 202 13 NIL NIL NIL NIL)("text" "html" ("charset" "utf-8") NIL NIL "quoted-printable" 13912 569 NIL NIL NIL NIL)("text" "calendar" ("charset" "utf-8" "method" "CANCEL") NIL NIL "quoted-printable" 876 28 NIL NIL NIL NIL) "alternative" ("boundary" "17454511540.31DF2C5.10203") NIL NIL NIL)("application" "ics" ("charset" "utf-8") "<[email protected]>" NIL "base64" 1158 NIL ("attachment" ("filename" "invite.ics")) NIL NIL) "mixed" ("boundary" "17454511541.5aEAf91.10203") NIL NIL NIL) 679 NIL NIL NIL NIL) "report" ("report-type" "delivery-status" "boundary" "275D42CC006A.1745451154/mailuser.phl.internal") NIL NIL NIL) BODY[HEADER] {1575}
S: [LITERAL DATA: 1575 bytes]
S: )
Good!
How do you generate a preview text without fetching the body? ;)
We've talked about the Proton issue in our team call: An invitation from Google sent to a Proton user results in Proton dropping the "method" parameter for the text/calendar part, so we no longer flag it as an iMIP message. The outcome of our brainstorming was to change the logic as follows: If the content type is text/calendar but there's no method parameter, we look at the attachment and check if it's an iMIP invitation, meaning we fetch the part.
Moving hasAttachments and isEncrypted should be fine.
How do you generate a preview text without fetching the body? ;)
You're right my comment should have said the "All the contents", we would need to pull at least some of the contents with something like "FETCH 2 (ENVELOPE BODYSTRUCTURE BODY[]<0.1000>)" this would pull 1000 bytes to text.
So originally the point of this PR was to discover imip attachments earlier, so that the imip server can process them faster.
But since we are willing to entertain the idea of a more efficient sync, here are my thoughts.
This is the resulting connections and commands produced from moving 5 messages in to an empty inbox (simulating arriving messages) while the mail app is open.
>> Connection to: imap://mail.example.com:993/
>> Server connection took 0.367 seconds.
C1: 1 AUTHENTICATE PLAIN [INITIAL CLIENT RESPONSE (username: [email protected])]
C1: 2 ENABLE QRESYNC
C1: 3 STATUS INBOX (MESSAGES UIDNEXT UIDVALIDITY HIGHESTMODSEQ)
C1: 4 EXAMINE INBOX (QRESYNC (1733797867 512 399:402))
C1: 5 UID SEARCH RETURN (ALL COUNT) UID 403:408
C1: 6 UID FETCH 403:407 (ENVELOPE FLAGS INTERNALDATE BODY.PEEK[HEADER])
S1: * 5 FETCH (UID 403 FLAGS (\Seen \Draft) INTERNALDATE "02-May-2025 09:57:36 -0400" ENVELOPE ("Fri, 02 May 2025 13:57:36 +0000" "Re: Threading test" ((NIL NIL "user1" "nextdev.app")) ((NIL NIL "user1" "nextdev.app")) ((NIL NIL "user1" "nextdev.app")) ((NIL NIL "user2" "nextdev.app")) NIL NIL NIL "<20250502135736.Horde.ToLVZRDH27pWa6YOIv9J23y@localhost>") BODY[HEADER] {330}
S1: [LITERAL DATA: 330 bytes]
S1: )
S1: * 6 FETCH (UID 404 FLAGS (\Seen \Draft) INTERNALDATE "02-May-2025 12:39:40 -0400" ENVELOPE ("Fri, 02 May 2025 16:39:40 +0000" "Re: Threading test" ((NIL NIL "user1" "nextdev.app")) ((NIL NIL "user1" "nextdev.app")) ((NIL NIL "user1" "nextdev.app")) ((NIL NIL "user2" "nextdev.app")) NIL NIL NIL "<20250502163940.Horde.mUHQaxTBnCru1zaMgNVCFbw@localhost>") BODY[HEADER] {330}
S1: [LITERAL DATA: 330 bytes]
S1: )
S:1 * 7 FETCH (UID 405 FLAGS (\Seen) INTERNALDATE "04-May-2025 15:11:05 -0400" ENVELOPE ("Sun, 04 May 2025 19:11:02 +0000" "Invitation: Singleton Test" ((NIL NIL "user1" "nextdev.app")) ((NIL NIL "user1" "nextdev.app")) ((NIL NIL "user1" "nextdev.app")) (("User 2" NIL "user2" "nextdev.app")) NIL NIL NIL "<20250504191102.Horde.DfmqrGPN3mFaWxCkMIsk1by@localhost>") BODY[HEADER] {375}
S:1 [LITERAL DATA: 375 bytes]
S1: )
S1: * 8 FETCH (UID 406 FLAGS (\Seen) INTERNALDATE "09-May-2025 14:27:43 -0400" ENVELOPE ("Fri, 09 May 2025 18:27:40 +0000" "Test Message" (("Sebastian" NIL "user1" "nextdev.app")) (("Sebastian" NIL "user1" "nextdev.app")) (("Sebastian" NIL "user1" "nextdev.app")) (("Sebastian" NIL "user1" "nextdev.app")) NIL NIL NIL "<[email protected]>") BODY[HEADER] {1147}
S1: [LITERAL DATA: 1147 bytes]
S1: )
S1: * 9 FETCH (UID 407 FLAGS (\Seen) INTERNALDATE "08-May-2025 15:39:36 -0400" ENVELOPE ("Thu, 08 May 2025 19:39:33 +0000" "Invitation: Testing Removing User" (("User 1" NIL "user1" "nextdev.app")) (("User 1" NIL "user1" "nextdev.app")) (("User 1" NIL "user1" "nextdev.app")) (("User 2" NIL "user2" "nextdev.app")) NIL NIL NIL "<20250508193933.Horde.v2xwV-UNF7Dv5nnNzTo4Skw@localhost>") BODY[HEADER] {391}
S: [LITERAL DATA: 391 bytes]
S: )
>> Connection to: imap://mail.example.com:993/
>> Server connection took 0.3185 seconds.
C2: 1 AUTHENTICATE PLAIN [INITIAL CLIENT RESPONSE (username: [email protected])]
C2: 2 ENABLE QRESYNC
C2: 3 EXAMINE INBOX (QRESYNC (1733797867 512 399:407))
C2: 4 UID FETCH 403 (ENVELOPE FLAGS INTERNALDATE BODY.PEEK[HEADER])
C2: 5 UID FETCH 403 (ENVELOPE BODYSTRUCTURE FLAGS INTERNALDATE BODY.PEEK[HEADER])
C2: 6 UID FETCH 403 (BODY.PEEK[1] BODY.PEEK[1.MIME])
C2: 7 UID FETCH 404 (ENVELOPE FLAGS INTERNALDATE BODY.PEEK[HEADER])
C2: 8 UID FETCH 404 (ENVELOPE BODYSTRUCTURE FLAGS INTERNALDATE BODY.PEEK[HEADER])
C2: 9 UID FETCH 404 (BODY.PEEK[1] BODY.PEEK[1.MIME])
C2: 10 UID FETCH 405 (ENVELOPE FLAGS INTERNALDATE BODY.PEEK[HEADER])
C2: 11 UID FETCH 405 (ENVELOPE BODYSTRUCTURE FLAGS INTERNALDATE BODY.PEEK[HEADER])
C2: 12 UID FETCH 405 (BODY.PEEK[1.1] BODY.PEEK[1.1.MIME])
C2: 13 UID FETCH 405 (BODY.PEEK[1.2] BODY.PEEK[1.2.MIME])
C2: 14 UID FETCH 405 (BODY.PEEK[2] BODY.PEEK[2.MIME])
C2: 15 UID FETCH 406 (ENVELOPE FLAGS INTERNALDATE BODY.PEEK[HEADER])
C2: 16 UID FETCH 406 (ENVELOPE BODYSTRUCTURE FLAGS INTERNALDATE BODY.PEEK[HEADER])
C2: 17 UID FETCH 406 (BODY.PEEK[1] BODY.PEEK[1.MIME])
C2: 18 UID FETCH 407 (ENVELOPE FLAGS INTERNALDATE BODY.PEEK[HEADER])
C2: 19 UID FETCH 407 (ENVELOPE BODYSTRUCTURE FLAGS INTERNALDATE BODY.PEEK[HEADER])
C2: 20 UID FETCH 407 (BODY.PEEK[1.1] BODY.PEEK[1.1.MIME])
C2: 21 UID FETCH 407 (BODY.PEEK[1.2] BODY.PEEK[1.2.MIME])
C2: 22 UID FETCH 407 (BODY.PEEK[2] BODY.PEEK[2.MIME])
C2: 23 LOGOUT
C1: 7 SELECT INBOX (QRESYNC (1733797867 512 399:402))
C1: 8 SEARCH RETURN (MIN COUNT) UNSEEN
C1: 9 LOGOUT
This generated 26 commands (Login/Logout omitted) on two simultaneous connections, this is with out other features like AI Summaries and classifications turned on / configured which would have caused additional connections and commands. This can be cut down to 3.
Here are some ides, on improving this.
Currently
- We connect to the server
- Find the messages we are interested in
- Pull basic information for those message
- We then trigger the preview enhancement
- preview enhancement then starts a new connection
- preview enhancement executes a single command for every body part we are interested in (3/4 per message)
- We then dispatch events that trigger other functions like Ai Summaries, Classifications, etc
- These event then create their own connection and re-pull the same messages again
This is highly inefficient, slow and process intensive on both the NC server and Mail server. Also as the events are not multi-threaded this all happens one by one, during the sync process.
Option 1
- We connect to the server
- Find the messages we are interested in
- Pull basic all required information and a partial body text for preview "FETCH 403:407 (ENVELOPE FLAGS INTERNALDATE BODY.PEEK[HEADER] BODYSTRUCTURE BODY[]<0.1000>)".
- We then dispatch events that trigger other functions like Ai Summaries, Classifications, etc
- These event then create their own connection and re-pull the same messages again
The pros of option are that it gives us the basic information about the message, information about attachment and enough text to generate a preview. The cons are that the events would still need to create a connection and re-pull the contents of the message.
Option 2
- We connect to the server
- Find the messages we are interested in
- Pull basic all required information and body text "FETCH 403:407 (ENVELOPE FLAGS INTERNALDATE BODYSTRUCTURE BODY[TEXT])".
- We then dispatch events that trigger other functions like Ai Summaries, Classifications, etc
I believe this is the best option, as this would give us everything we require in one connection and only 3 commands. The pros here are that we get everything we need with one fetch, envelope information, body structure and full body text. Interestingly, BODY[TEXT], returns plain text/plain, text/html, and text/calendar. (Tested on dovecot server) Because we have the full body text this can then be passed to the events which now no longer need to re-connect to the server to download messages again.
:+1: for your suggestion if it doesn't come with any drawbacks
Will this fix allow users to handle calendar appointments from office 365 properly. See #10788
Adjusting label status to PR status