mu
mu copied to clipboard
[mu4e bug] encoding/decoding errors for non utf-8 messages
The bug I live and work in France, so most of my email contains accents and a lot of people mailing me use encodings different from UTF-8 (windows-1252 for instance). This results in garbled text in mu4e. If I open the same email in firefox or our webmail system, the text is correct.
To Reproduce An email with this in the header:
--------------2B90AD83561B9846F2ECD871
Content-Type: multipart/alternative;
boundary="------------69C3D6A5B38E7F0BD6DDD2D3"
--------------69C3D6A5B38E7F0BD6DDD2D3
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Yields
Environment OS : Ubuntu 21.04 Emacs : This is GNU Emacs 29.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.25, cairo version 1.16.0) of 2021-10-05 Mu : 1.6.5-60-g11d41bd1
Checklist
- [x] you are running vanilla emacs (i.e. without Doom, Evil, Spacemacs etc.) (otherwise, please try to reproduce without those
- [x] you are running mu4e without any third-party extensions (otherwise, please try to reproduce without those)
- [x] you are running either the latest 1.6.x release, or a 1.7.x development release (otherwise, please upgrade).
Please attach a full message file where this happens (remove any personal info as needed), thanks.
It is not the same message as above, but one that has the same problem. I renamed the file so github accepts it.
Hmm, when I check that, I see e.g. what I think should be "Ingénieur de Recherche" but I get: "Ing�nieur de Recherche" with the "�" literally the UTF-8 replacement characters (bytes: ef bf bd)
This looks different from the screenshot errors, which look like typical for messages with the wrong character code set. So, is there some other example, or did something get lost in translation?
I bet Firefox etc. do a better job trying to second-guess a message that specifies an incorrect character-set.
@joukeHijlkema Are you sure the message is correctly encoded with windows-1252 (it is a single byte encoding unlike the message you attached)?
I'm not sure of anything. I'm pretty sure the encoding announced in the header is not the one used in the text. The problem is that I have no influence on the software the sender uses ...
On Tue, Nov 9, 2021 at 1:02 AM Christophe Troestler < @.***> wrote:
@joukeHijlkema https://github.com/joukeHijlkema Are you sure the message is correctly encoded with windows-1252 (it is a single byte encoding https://en.wikipedia.org/wiki/Windows-1252 unlike the message you attached)?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-963685101, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WRPHBDHX64P5GOVG23ULBQJRANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Yeah, as much as we try, their are still some holdouts that do not use mu4e yet.
But, the message you attached contains literal "�" and those are rendered "correctly". However, that looks very different from the screenshot; are you sure you attached a raw file, as present in your Maildir?
I'm not sure how to modify the file and save it keeping the exact same encoding. I might have done something wrong.
On Tue, Nov 9, 2021 at 2:28 PM Dirk-Jan C. Binnema @.***> wrote:
Yeah, as much as we try, their are still some holdouts that do not use mu4e yet.
But, the message you attached contains literal "�" and those are rendered "correctly". However, that looks very different from the screenshot; are you sure you attached a raw file, as present in your Maildir?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-964151743, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WSZWXJD56VVRO3DNWLULEOWXANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
On Tuesday Nov 09 2021, jouke hijlkema wrote:
I'm not sure how to modify the file and save it keeping the exact same encoding. I might have done something wrong.
Open the message, M-x mu4e-copy-message-path
. Then open this file
C-x C-f
C-y RET
Edit as needed, and save:
C-x C-w ~/my-message.txt RET
and attach.
There you go. It still has the funny "�". I've attached a screenshot from emacs just before i saved the message path. Hope this helps.
Jouke
On Tue, Nov 9, 2021 at 3:52 PM Dirk-Jan C. Binnema @.***> wrote:
On Tuesday Nov 09 2021, jouke hijlkema wrote:
I'm not sure how to modify the file and save it keeping the exact same encoding. I might have done something wrong.
Open the message,
M-x mu4e-copy-message-path
. Then open this file C-x C-f C-y RET Edit as needed, and save: C-x C-w ~/my-message.txt RET and attach.— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-964224263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WVEY34GYBGCGJM3T23ULEYSLANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Return-Path: @.> Received: from deliver ([unix socket]) by cottos (Cyrus v2.4.6) with LMTPA; Wed, 22 Sep 2021 13:12:56 +0200 X-Sieve: CMU Sieve 2.4 Received: from onera.onera.fr (amon.onera [1.1.1.5]) by cottos.onecert.fr (Postfix) with ESMTP id 49B9124428E for @.>; Wed, 22 Sep 2021 13:12:56 +0200 (CEST) Received: from toto.onera.net (toto.onera.net [1.1.1.1]) by toto.onera.fr with ESMTP id 18MBCu6F004932; Wed, 22 Sep 2021 13:12:56 +0200 Received: from onera.onera.fr (toto.onera [125.1.1.55]) by toto.onera.net (Postfix) with ESMTPS id 2BA60749AE6; Wed, 22 Sep 2021 13:12:56 +0200 (CEST) Received: from nereetoto.onecert.fr (toto.onecert.fr [134.212.194.12]) by toto.onera.fr with ESMTP id 18MBCtQJ004930; Wed, 22 Sep 2021 13:12:56 +0200 Received: from [10.25.1.156] (vpn-toto.onera.net [10.25.1.156]) by toto.onecert.fr (8.14.3/8.14.3/ONERA-SRI) with ESMTP id 18MBCtt7016895; Wed, 22 Sep 2021 13:12:55 +0200 Subject: =?UTF-8?Q?Re=3a_Fwd=3a_PERSEUS=3a_s=c3=a9minaire_de_rentr=c3=a9e_le?= =?UTF-8?Q?23_Septembre=c3=a0_partir_de_9h00_en_webex?= To: toto toto DTP/ES @.> Cc: toto toto @.>, toto toto @.>, toto toto @.> References: @.> @.> From: =?UTF-8?B?SsOpcsO0bWUgQU5USE9JTkU=?= @.> Organization: ONERA Message-ID: @.> Date: Wed, 22 Sep 2021 13:12:55 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: @.***> Content-Type: multipart/alternative; boundary="------------237AFE8D6B473E3EE1279763" Content-Language: fr
This is a multi-part message in MIME format. --------------237AFE8D6B473E3EE1279763 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit
Bonjour G�rard, Jouke est en cong�. Je ne sais pas si Yves Mauriot pourra se joindre � la r�union. Dans le cas contraire, merci de nous faire un retour quand tu pourras. Bien cordialement, J�r�me
Below how message.txt
displays for me. The � are when the message is interpreted as UTF-8. I don't understand the actual encoding of the file.
I agree that the error is in the encoding of the message and not in mu. However, our webmail interprets this correctly.
On Tue, Nov 9, 2021 at 5:05 PM Christophe Troestler < @.***> wrote:
Below how message.txt displays for me. The � are when the message is interpreted as UTF-8. I don't understand the actual encoding of the file. [image: image] https://user-images.githubusercontent.com/1255665/140959316-008d8317-d3a5-4dac-9c31-b184a5b5c324.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-964294478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WU6UNGUS44JQICRUYLULFBEXANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@joukeHijlkema What software do you use for webmail? How do you get the messages from your mail server (isync, offlineimap)?
I've no idea what is running behind our webmail. It's on a company server. I use offlineimap to get the mail.
On Tue, Nov 9, 2021 at 9:43 PM Christophe Troestler < @.***> wrote:
@joukeHijlkema https://github.com/joukeHijlkema What software do you use for webmail? How do you get the messages from your mail server (isync, offlineimap)?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-964526942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WWUHZF6BPHVE2JPI7DULGBX7ANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@joukeHijlkema If (say, using the web interface) you can send to yourself emails with such a problem, would you mind sending me one (address on the profile page)?
I'm not sure that I understand what you mean. The problematic emails come from other people. I'm sure the problem lays with them. I could forward a problematic email to you.
On Wed, Nov 10, 2021 at 11:54 PM Christophe Troestler < @.***> wrote:
@joukeHijlkema https://github.com/joukeHijlkema If (say, using the web interface) you can send to yourself emails with such a problem, would you mind sending me one (address on the profile page)?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-965818125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WWQNUVXAHPBCOWMXVDULLZ4FANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
I'm trying to understand where the problem is in your infrastructure. If, when you send emails to yourself using your webmail, they arrive well encoded, how is it that emails from other people in your company turn out to be badly encoded? Is it the webmail that is choosing a bad encoding? If so which one? There are so many unknowns here that it is difficult to pinpoint what is going on. Forwarding a problematic email may not give so many clues because it has already been processed by all your company' systems.
I'm afraid we can't do much about this in mu/mu4e, but I still don't fully understand the problem, but I guess we've spent enough time on it. If this is still seen please attach (as a file, copy-pasting the text into github is not so useful) a problematic email, thanks!