aioimaplib icon indicating copy to clipboard operation
aioimaplib copied to clipboard

How to handle literals in FETCH response

Open lllama opened this issue 4 years ago • 5 comments

Thanks for this library - I've had some good success with it so far.

However, if I have an email with a double quote in the subject, then the FETCH response for the message is split over multiple lines, and uses a literal line for the subject.

Is there a recommended way to handle this? If I do a FETCH for all messages, then the situation seems even trickier to handle, as there seems to be no easy way to tell which line belongs to which message.

The standard lib imaplib seems to bundle the responses into tuples of envelopes and data but we only get the response lines (unless I've missed something).

lllama avatar Nov 25 '21 11:11 lllama

Ok thanks for this issue and the link. I'm having a look on this soon.

bamthomas avatar Dec 30 '21 16:12 bamthomas

as for #71 the API is not the same as imaplib. It is formatting less the responses from the imapserver. Yet it it is structured with a uniform type (list of bytes chains for IMAP protocol parts, and bytearrays for the data).

This is not related to whether there is double quote or not.

For example i tried with a double quote in subject. The first line should be the app getting the last searched UID, without the body for example with only the uid/flags/subject here.

Then when the user asks for a specific mail, then the body is searched with the second command.

result, lines = await imap_client.uid('fetch', '1950:*', '(UID FLAGS BODY.PEEK[HEADER.FIELDS (SUBJECT)])')
print(lines)
result, lines = await await imap_client.uid('fetch', '1984', 'BODY.PEEK[]')
print(lines)

This will display (I separated each list item for clarity):

[
b'1544 FETCH (UID 1950 FLAGS (NonJunk) BODY[HEADER.FIELDS (SUBJECT)] {68}', 
bytearray(b'Subject: =?utf-8?Q?=5bSlack=5d_Sender_sent_you_a_message?=\r\n\r\n'), 
b')', 
b'1545 FETCH (UID 1951 FLAGS (NonJunk) BODY[HEADER.FIELDS (SUBJECT)] {167}', 
bytearray(b'Subject: [adherents] =?utf-8?Q?AGIT_-_Ne_manquez_pas_vos_procha?=\r\n\t=?utf-8?Q?ins_=C3=A9v=C3=A9nements_sur_la_Responsabi?=\r\n\t=?utf-8?Q?lit=C3=A9_Num=C3=A9rique_!?=\r\n\r\n'),
b')'
...
]

so we see that there are groups of 3 lines

  • first is the FETCH response,
  • second is the data
  • third is the closing parenthesis

Then the BODY.PEEK will display :

[
b'1577 FETCH (UID 1984 BODY[] {3339}', 
bytearray(b'Return-Path: <[email protected]>\r\nDelivered-To: [email protected]\r\nX-Envelope-To: [email protected]\r\nReceived: (...message headers) ------GWONCOFAFSCBGQO9559AI5OF8JL2BA\r\nContent-Type: text/plain;\r\n charset=utf-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\nComment c\'est "double quotes" ?\r\n\r\n------GWONCOFAFSCBGQO9559AI5OF8JL2BA\r\nContent-Type: text/html;\r\n charset=utf-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\nComment c\'est "double quotes" ?\r\n------GWONCOFAFSCBGQO9559AI5OF8JL2BA--\r\n'), 
b')', 
b'Fetch completed (0.001 + 0.000 secs).'
]

Here is quite the same : first is the IMAP response, then the mail content is always at the index 1, and can be directly passed to python mail API with :

msg = email.message_from_bytes(lines[1])
print(msg['subject'])
# will print 'Test "guillemets" '

bamthomas avatar Dec 31 '21 10:12 bamthomas

I think it would be nice if aioimaplib would provide a more high-level API for the fetch (similar to imaplib). While the current format has some structure, relying on it seems a bit hacky. Especially, if one wants to read out multiple data items (e.g. the UID in addition to the body). Concrete problems I see:

  • I don't think there is any guarantee on the order of data items. If fetching multiple multiline strings an access like lines[1] is not sufficient, but one has to extract the preceding "key". Also, one must be aware that the closing parenthesis might be preceded by additional data items (e.g. b'FLAGS (\\Seen)))
  • The server might sent additional data items that weren't requested. I have only seen this for FLAGS, but I was unable to find anything in RFC3501 restricting this (though the formal syntax differentiates msg-att-dynamic and msg-att-static). This would be a problem if an additional literal is included in the response.
  • While any server will most likely always send the email body as literal, I think, it would also be allowed to use a quoted string which then would not appear on a separate line (I suppose).

To actually be sure to handle all these corner cases, I think it is currently required that the consumer parses the response based on the RFC. I did this here using an implemenation of the grammar with pyparsing. However, this still seems to not work with Office365 IMAP (jgosmann/dmarc-metrics-exporter#17). It also has some rough edges:

  • I need to stitch the "structured" response back together, so that I can parse it myself.
  • The last line of the response is annoying because it only states FETCH completed which is not standardized. Only the part <TAG> OK prefixing is, but that is already stripped away by aiomaplib. Thus, it is hard to tell whether something is just the completing line of the command or some invalid response (or response not supported by the parser).

jgosmann avatar Jan 10 '22 18:01 jgosmann

I agree with the above - in my app, I'm making a FETCH request for ENVELOPES for mails in my mailbox. response.lines gives me a single line per mail, unless I've included a double quote (") in a subject line. This then triggers the subject to be returned as a literal (I'm using Dovecot in this case), which is included as a bytearray, and then the rest of the envelope is returned as a byte string in the next list element i.e. I get three list elements instead of one.

My understanding is that imaplib will group these three lines into a list of their own, and then the entire response is returned as a list of lists.

I believe that's similar to what @jgosmann is talking about.

lllama avatar Jan 10 '22 21:01 lllama

My statements about the unknown order of data items and potential additional data items included were correct. RFC3501 actually references RFC2683 with recommendations for implementors and clearly states these points.

jgosmann avatar Jan 11 '22 19:01 jgosmann