Work around broken encodings in received messages
Context: mu+mu4e 1.12.4 running on Emacs 29.3, both installed via Portage (net-mail/mu and app-editors/emacs) on Gentoo.
i recently received an email whose raw subject line is:
Subject: =?ISO-8859-1?Q?We=92ve_reconnected_=96_and_next_steps?=
This subject line is correctly displayed in mu4e:view mode, as:
We’ve reconnected – and next steps
but in mu4e:headers mode, it's displayed as:
We\222ve reconnected \226 and next steps
i.e. the byte sequence:
Weve reconnected and next steps
This appears to be the result of the rfc2047-decode-region function, or some equivalent, not being run on the text; running that function on it results in correct display.
Checklist
- [x] you are running either an 1.10.x/1.12.x release or
master(otherwise please upgrade) - [x] you can reproduce the problem without 3rd party extensions (including Doom/Evil, various extensions etc.)
- [x] you have read all of the above
Can you attach a message file (anonymized as needed) where this happens? Thanks.
The only such email i have is the one i received today, which contains personal health information. i've redacted the bodies (i.e. the two MIME parts) basically in their entirety, and also redacted various bits of header content in a minimal way, hopefully still leaving it usable.
Emacs is just showing what it gets from the mu-server, it doesn't decode anything in the headers buffer.
Looking in a message (where it's shown as expected, with M-x describe-char I get:
character: ’ (displayed as ’) (codepoint 8217, #o20031, #x2019)
charset: windows-1252 (WINDOWS-1252 (Latin I))
so the problem seems to be that the original message uses the window-1252 charset, but claimed it was ISO-8859-1:
Subject: =?ISO-8859-1?Q?We=92ve_reconnected_=96_and_next_steps?=
you can see that if you'd change the subject to
Subject: =?WINDOWS-1252?Q?We=92ve_reconnected_=96_and_next_steps?=
it will show correctly (after re-indexing etc.).
Now, while it's the message's sender that's misbehaving, that won't help us very much.
mu can't easily do with gnus mail does (we're bound by GMime), but I'll turn this into an RFE ticket and see if we can find a work-around.
Ah, great analysis, thank you. i think i had indeed run describe-char and noticed that 1252 was mentioned, but it didn't click that this was not what the "Subject" header was claiming ....
Thanks for converting this to an RFE. 👍 i'm going to try emailing postmaster@salesforce about this, which is probably unlikely to result in any change, but at least i'll have tried. 😛
It's just come to my attention that the HTML5 spec says that "ISO-8859-1" is to be interpreted as Windows-1252. So presumably what's happening in this email is that it's assumed it will be read in a Web-based client - which, to be fair, is more than likely the case - such that the HTML5 spec is applicable. Which, fwiw, feels incorrect to me: even if the email body contains only a text/html MIME part, the headers are certainly not HTML.
I'm moving this to the IDEAS.org file and close it here shortly... would probably best be solved at the GMime level.