php-imap
php-imap copied to clipboard
Inconsistent parsing of message_id and in_reply_to
Describe the bug
Header::parse removes < and > from message_id, but not from in_reply_to.
A message with Message-ID: <[email protected]> will have $message->message_id="[email protected]"
A message with In-Reply-To: <[email protected]> will have $message->in_reply_to="<[email protected]>"
As a result the two cannot be compared or used in database queries without first removing < and > from in_reply_to.
Expected behavior
message_id and in_reply_to should use the same format/parsing, so they can be used in queries and comparisons.
Hi @nielspeen , many thanks for your report.
I just checked the Header.php and you are absolutely right:
https://github.com/Webklex/php-imap/blob/ed93fe43ac3e71ffcc4e9a61d3016f91173d617a/src/Header.php#L212-L214
The message_id gets a special treatment whereas in_reply_to doesn't.
Best regards,
It seems like the same thing is happening with references.
Does this have a particular background or could we remove the less than / greater than signs in reply-to and references?
PS: Awesome project :)
Maybe it is easy not to delete the signs at the message id?
I already tried to find out why the characters at the message ids are present at all. Unfortunately I haven't found anything about it in the RFCs so far.
After a long search to see if the angle brackets belong to the message id, I finally found something in the RFC.
According to RFC 2822 (page 25):
Semantically, the angle brackets are not part of the msg-id; the msg-id is what is contained between the two angle brackets characters.
Other sources: RFC 822 Chapter 3.4.6
Angle brackets ("<" and ">") are generally used to indicate the presence of a one machine-usable reference (e.g., delimiting mailboxes), possibly including source-routing to the machine.
https://stackoverflow.com/a/34811337/10599992
The maximum line length per the RFC you cite is 998 characters. That would include the "Message-ID:" field name, but you can do line folding between the field name and the field body. The line containing the actual Message-ID would then contain a space (the folding whitespace), "<", Message-ID, and ">". Semantically, the angle brackets are not part of the Message-ID. Therefore you end up with a maximum of 998 - 3 = 995 characters.
I therefore interpret that all message ids must have the angle brackets removed, since they are only used to identify the number in question. We need to handle the In-Reply-to and references headers separately.
What do you think about it, @Webklex?
Should we maybe write a separate class for the three headers, similar to the address class? In this we could then also provide the message ids with the angle brackets.
I look forward to hearing your thoughts on this!