cpython icon indicating copy to clipboard operation
cpython copied to clipboard

email.utils.make_msgid return ids that break email messages with related content

Open ostefano opened this issue 3 years ago • 14 comments

Bug report

I have been trying to replicate the examples listed here: https://docs.python.org/3/library/email.examples.html

For some reason the one about "creating an HTML message with an alternative plain text version" is assembling an email message that Thunderbird (and other email readers) does not display correctly, as images are not displayed and marked as broken.

The example uses make_msgid() to generate content ids.

Python 3.10.9 (main, Dec  7 2022, 03:14:04) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from email import utils
>>> utils.make_msgid()
'<167119948916.50921.14529814791249370642@1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa>'
>>>

Turns out that for some reason the string is too long, because if I either remove the domain part or purportedly shorten it, e.g., make_msgid(domain="0.0.0.ip6.arpa"), then everything works again and the resulting email can be correctly displayed in Thunderbird/Outlook.

Your environment

  • CPython versions tested on: 3.10.9
  • Operating system and architecture: OSX/M1

Linked PRs

  • gh-100856

ostefano avatar Dec 16 '22 14:12 ostefano

make_msgid by default uses socket.getfqdn() to get the domain part. For my machine it is short enough. So, you have two options:

  1. Change your hostname
  2. Use explicit domain name

I don't think that there's anything we can do from our side.

sobolevn avatar Jan 08 '23 07:01 sobolevn

@sobolevn I am perfectly fine implementing that workaround in my code. The problem is that this issue is not documented at all, and users reading the official documentation here https://docs.python.org/3/library/email.examples.html might try to implement the example and found themselves completely stumped.

I think we should at least add what you say in your reply to the documentation page linked above. What do you think?

ostefano avatar Jan 08 '23 14:01 ostefano

Looks like it is documented here: https://docs.python.org/3/library/email.utils.html?highlight=make_msgid#email.utils.make_msgid

I don't think that adding implementation details of make_msgid to the multi-alternatives example is a good idea.

However, making docs better is always a good thing, so - if you have some specific suggestions, please feel free to post them! :)

sobolevn avatar Jan 08 '23 14:01 sobolevn

What about something like: "Note that modern email clients might not display correctly emails containing resources with message-id longer than XX characters" ?

ostefano avatar Jan 08 '23 15:01 ostefano

While Thunderbird doesn't display messages with a long msgid correctly, Apple Mail does. Which other email clients are not working?

dtrodrigues avatar Jan 08 '23 15:01 dtrodrigues

Outlook 365 (latest on the stable channel)

ostefano avatar Jan 08 '23 15:01 ostefano

Something like "Note that some email clients might not correctly display emails containing resources with long Message-Id, which usually happens due to the long domain part" sounds like a reasonable note to add! 👍

sobolevn avatar Jan 08 '23 15:01 sobolevn

@sobolevn 👍 If you point me to the right documentation file, I'd be happy to create the PR.

ostefano avatar Jan 08 '23 15:01 ostefano

Here you go! https://github.com/python/cpython/blob/main/Doc/library/email.utils.rst

sobolevn avatar Jan 08 '23 15:01 sobolevn

FWIW, the Thunderbird bug report is here: https://bugzilla.mozilla.org/show_bug.cgi?id=1612465

The longer domain is causing python to encode the Content-ID value to split it across multiple lines, but Thunderbird doesn't seem to support that part of the spec.

dtrodrigues avatar Jan 08 '23 16:01 dtrodrigues

@sobolevn done 👍

ostefano avatar Jan 08 '23 16:01 ostefano

At the risk of muddying the waters, I think this is actually a bug. I don't believe message-id headers are technically allowed to be encoded using encoded words. The spec is pretty clear that addr-specs are not to be rfc 2047 encoded, and a message-id is composed of addr-spec like things. More directly on point, it is a structured field and its contents is not a phrase. The email package should really probably default to not doing encoding except where it is permitted...instead I went with preventing it on demand (encode_as_ew = False, but the default is True). I believe I did that because X-headers can contain encoded words, and I wanted doing such encoding of X-headers to be the default. I think now that was an incorrect design decision, as it has resulted in several bug reports like this one, including one, if I recall correctly, that was an X-header.

Now, I could be wrong about encoding of message-id headers. After all, I was much more cognizant of the RFCs when I was writing the code than I am now, years later ;)

If I'm right this raises the question of how you comply with the RFC line length requirements while also not using encoded words. The answer, I think, is that you don't. Long lines are handled correctly by far more mail clients than encoding-where-it-doesn't-belong is.

bitdancer avatar Jan 10 '23 00:01 bitdancer

@sobolevn @bitdancer what is the consensus here? Shall we merge the PR in the meanwhile?

ostefano avatar Feb 16 '23 14:02 ostefano

Also there is a related doc PR https://github.com/python/cpython/pull/100856

blaisep avatar May 20 '24 20:05 blaisep