php-imap icon indicating copy to clipboard operation
php-imap copied to clipboard

Inconsistent parsing of message_id and in_reply_to

Open nielspeen opened this issue 3 years ago • 4 comments

Describe the bug

Header::parse removes < and > from message_id, but not from in_reply_to.

A message with Message-ID: <[email protected]> will have $message->message_id="[email protected]"

A message with In-Reply-To: <[email protected]> will have $message->in_reply_to="<[email protected]>"

As a result the two cannot be compared or used in database queries without first removing < and > from in_reply_to.

Expected behavior

message_id and in_reply_to should use the same format/parsing, so they can be used in queries and comparisons.

nielspeen avatar Feb 03 '22 15:02 nielspeen

Hi @nielspeen , many thanks for your report.

I just checked the Header.php and you are absolutely right: https://github.com/Webklex/php-imap/blob/ed93fe43ac3e71ffcc4e9a61d3016f91173d617a/src/Header.php#L212-L214

The message_id gets a special treatment whereas in_reply_to doesn't.

Best regards,

Webklex avatar Feb 03 '22 16:02 Webklex

It seems like the same thing is happening with references.

Does this have a particular background or could we remove the less than / greater than signs in reply-to and references?

PS: Awesome project :)

HelloSebastian avatar Feb 16 '22 09:02 HelloSebastian

Maybe it is easy not to delete the signs at the message id?

I already tried to find out why the characters at the message ids are present at all. Unfortunately I haven't found anything about it in the RFCs so far.

HelloSebastian avatar Feb 20 '22 14:02 HelloSebastian

After a long search to see if the angle brackets belong to the message id, I finally found something in the RFC.

According to RFC 2822 (page 25):

Semantically, the angle brackets are not part of the msg-id; the msg-id is what is contained between the two angle brackets characters.

Other sources: RFC 822 Chapter 3.4.6

Angle brackets ("<" and ">") are generally used to indicate the presence of a one machine-usable reference (e.g., delimiting mailboxes), possibly including source-routing to the machine.

https://stackoverflow.com/a/34811337/10599992

The maximum line length per the RFC you cite is 998 characters. That would include the "Message-ID:" field name, but you can do line folding between the field name and the field body. The line containing the actual Message-ID would then contain a space (the folding whitespace), "<", Message-ID, and ">". Semantically, the angle brackets are not part of the Message-ID. Therefore you end up with a maximum of 998 - 3 = 995 characters.

I therefore interpret that all message ids must have the angle brackets removed, since they are only used to identify the number in question. We need to handle the In-Reply-to and references headers separately.

What do you think about it, @Webklex?

Should we maybe write a separate class for the three headers, similar to the address class? In this we could then also provide the message ids with the angle brackets.

I look forward to hearing your thoughts on this!

HelloSebastian avatar Mar 12 '22 09:03 HelloSebastian