mu icon indicating copy to clipboard operation
mu copied to clipboard

[mu4e bug] encoding/decoding errors for non utf-8 messages

Open joukeHijlkema opened this issue 3 years ago • 16 comments

The bug I live and work in France, so most of my email contains accents and a lot of people mailing me use encodings different from UTF-8 (windows-1252 for instance). This results in garbled text in mu4e. If I open the same email in firefox or our webmail system, the text is correct.

To Reproduce An email with this in the header:

--------------2B90AD83561B9846F2ECD871
Content-Type: multipart/alternative;
 boundary="------------69C3D6A5B38E7F0BD6DDD2D3"


--------------69C3D6A5B38E7F0BD6DDD2D3
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit

Yields

image

Environment OS : Ubuntu 21.04 Emacs : This is GNU Emacs 29.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.25, cairo version 1.16.0) of 2021-10-05 Mu : 1.6.5-60-g11d41bd1

Checklist

  • [x] you are running vanilla emacs (i.e. without Doom, Evil, Spacemacs etc.) (otherwise, please try to reproduce without those
  • [x] you are running mu4e without any third-party extensions (otherwise, please try to reproduce without those)
  • [x] you are running either the latest 1.6.x release, or a 1.7.x development release (otherwise, please upgrade).

joukeHijlkema avatar Oct 05 '21 08:10 joukeHijlkema

Please attach a full message file where this happens (remove any personal info as needed), thanks.

djcb avatar Oct 06 '21 04:10 djcb

It is not the same message as above, but one that has the same problem. I renamed the file so github accepts it.

message.txt

joukeHijlkema avatar Oct 07 '21 07:10 joukeHijlkema

Hmm, when I check that, I see e.g. what I think should be "Ingénieur de Recherche" but I get: "Ing�nieur de Recherche" with the "�" literally the UTF-8 replacement characters (bytes: ef bf bd)

This looks different from the screenshot errors, which look like typical for messages with the wrong character code set. So, is there some other example, or did something get lost in translation?

I bet Firefox etc. do a better job trying to second-guess a message that specifies an incorrect character-set.

djcb avatar Nov 08 '21 22:11 djcb

@joukeHijlkema Are you sure the message is correctly encoded with windows-1252 (it is a single byte encoding unlike the message you attached)?

Chris00 avatar Nov 09 '21 00:11 Chris00

I'm not sure of anything. I'm pretty sure the encoding announced in the header is not the one used in the text. The problem is that I have no influence on the software the sender uses ...

On Tue, Nov 9, 2021 at 1:02 AM Christophe Troestler < @.***> wrote:

@joukeHijlkema https://github.com/joukeHijlkema Are you sure the message is correctly encoded with windows-1252 (it is a single byte encoding https://en.wikipedia.org/wiki/Windows-1252 unlike the message you attached)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-963685101, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WRPHBDHX64P5GOVG23ULBQJRANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

joukeHijlkema avatar Nov 09 '21 12:11 joukeHijlkema

Yeah, as much as we try, their are still some holdouts that do not use mu4e yet.

But, the message you attached contains literal "�" and those are rendered "correctly". However, that looks very different from the screenshot; are you sure you attached a raw file, as present in your Maildir?

djcb avatar Nov 09 '21 13:11 djcb

I'm not sure how to modify the file and save it keeping the exact same encoding. I might have done something wrong.

On Tue, Nov 9, 2021 at 2:28 PM Dirk-Jan C. Binnema @.***> wrote:

Yeah, as much as we try, their are still some holdouts that do not use mu4e yet.

But, the message you attached contains literal "�" and those are rendered "correctly". However, that looks very different from the screenshot; are you sure you attached a raw file, as present in your Maildir?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-964151743, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WSZWXJD56VVRO3DNWLULEOWXANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

joukeHijlkema avatar Nov 09 '21 14:11 joukeHijlkema

On Tuesday Nov 09 2021, jouke hijlkema wrote:

I'm not sure how to modify the file and save it keeping the exact same encoding. I might have done something wrong.

Open the message, M-x mu4e-copy-message-path. Then open this file C-x C-f C-y RET Edit as needed, and save: C-x C-w ~/my-message.txt RET and attach.

djcb avatar Nov 09 '21 14:11 djcb

There you go. It still has the funny "�". I've attached a screenshot from emacs just before i saved the message path. Hope this helps.

Jouke

On Tue, Nov 9, 2021 at 3:52 PM Dirk-Jan C. Binnema @.***> wrote:

On Tuesday Nov 09 2021, jouke hijlkema wrote:

I'm not sure how to modify the file and save it keeping the exact same encoding. I might have done something wrong.

Open the message, M-x mu4e-copy-message-path. Then open this file C-x C-f C-y RET Edit as needed, and save: C-x C-w ~/my-message.txt RET and attach.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-964224263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WVEY34GYBGCGJM3T23ULEYSLANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Return-Path: @.> Received: from deliver ([unix socket]) by cottos (Cyrus v2.4.6) with LMTPA; Wed, 22 Sep 2021 13:12:56 +0200 X-Sieve: CMU Sieve 2.4 Received: from onera.onera.fr (amon.onera [1.1.1.5]) by cottos.onecert.fr (Postfix) with ESMTP id 49B9124428E for @.>; Wed, 22 Sep 2021 13:12:56 +0200 (CEST) Received: from toto.onera.net (toto.onera.net [1.1.1.1]) by toto.onera.fr with ESMTP id 18MBCu6F004932; Wed, 22 Sep 2021 13:12:56 +0200 Received: from onera.onera.fr (toto.onera [125.1.1.55]) by toto.onera.net (Postfix) with ESMTPS id 2BA60749AE6; Wed, 22 Sep 2021 13:12:56 +0200 (CEST) Received: from nereetoto.onecert.fr (toto.onecert.fr [134.212.194.12]) by toto.onera.fr with ESMTP id 18MBCtQJ004930; Wed, 22 Sep 2021 13:12:56 +0200 Received: from [10.25.1.156] (vpn-toto.onera.net [10.25.1.156]) by toto.onecert.fr (8.14.3/8.14.3/ONERA-SRI) with ESMTP id 18MBCtt7016895; Wed, 22 Sep 2021 13:12:55 +0200 Subject: =?UTF-8?Q?Re=3a_Fwd=3a_PERSEUS=3a_s=c3=a9minaire_de_rentr=c3=a9e_le?= =?UTF-8?Q?23_Septembre=c3=a0_partir_de_9h00_en_webex?= To: toto toto DTP/ES @.> Cc: toto toto @.>, toto toto @.>, toto toto @.> References: @.> @.> From: =?UTF-8?B?SsOpcsO0bWUgQU5USE9JTkU=?= @.> Organization: ONERA Message-ID: @.> Date: Wed, 22 Sep 2021 13:12:55 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: @.***> Content-Type: multipart/alternative; boundary="------------237AFE8D6B473E3EE1279763" Content-Language: fr

This is a multi-part message in MIME format. --------------237AFE8D6B473E3EE1279763 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit

Bonjour G�rard, Jouke est en cong�. Je ne sais pas si Yves Mauriot pourra se joindre � la r�union. Dans le cas contraire, merci de nous faire un retour quand tu pourras. Bien cordialement, J�r�me

joukeHijlkema avatar Nov 09 '21 15:11 joukeHijlkema

Below how message.txt displays for me. The � are when the message is interpreted as UTF-8. I don't understand the actual encoding of the file. image

Chris00 avatar Nov 09 '21 16:11 Chris00

I agree that the error is in the encoding of the message and not in mu. However, our webmail interprets this correctly.

On Tue, Nov 9, 2021 at 5:05 PM Christophe Troestler < @.***> wrote:

Below how message.txt displays for me. The � are when the message is interpreted as UTF-8. I don't understand the actual encoding of the file. [image: image] https://user-images.githubusercontent.com/1255665/140959316-008d8317-d3a5-4dac-9c31-b184a5b5c324.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-964294478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WU6UNGUS44JQICRUYLULFBEXANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

joukeHijlkema avatar Nov 09 '21 16:11 joukeHijlkema

@joukeHijlkema What software do you use for webmail? How do you get the messages from your mail server (isync, offlineimap)?

Chris00 avatar Nov 09 '21 20:11 Chris00

I've no idea what is running behind our webmail. It's on a company server. I use offlineimap to get the mail.

On Tue, Nov 9, 2021 at 9:43 PM Christophe Troestler < @.***> wrote:

@joukeHijlkema https://github.com/joukeHijlkema What software do you use for webmail? How do you get the messages from your mail server (isync, offlineimap)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-964526942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WWUHZF6BPHVE2JPI7DULGBX7ANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

joukeHijlkema avatar Nov 10 '21 08:11 joukeHijlkema

@joukeHijlkema If (say, using the web interface) you can send to yourself emails with such a problem, would you mind sending me one (address on the profile page)?

Chris00 avatar Nov 10 '21 22:11 Chris00

I'm not sure that I understand what you mean. The problematic emails come from other people. I'm sure the problem lays with them. I could forward a problematic email to you.

On Wed, Nov 10, 2021 at 11:54 PM Christophe Troestler < @.***> wrote:

@joukeHijlkema https://github.com/joukeHijlkema If (say, using the web interface) you can send to yourself emails with such a problem, would you mind sending me one (address on the profile page)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/djcb/mu/issues/2154#issuecomment-965818125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADC3WWQNUVXAHPBCOWMXVDULLZ4FANCNFSM5FLER2YA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

joukeHijlkema avatar Nov 11 '21 10:11 joukeHijlkema

I'm trying to understand where the problem is in your infrastructure. If, when you send emails to yourself using your webmail, they arrive well encoded, how is it that emails from other people in your company turn out to be badly encoded? Is it the webmail that is choosing a bad encoding? If so which one? There are so many unknowns here that it is difficult to pinpoint what is going on. Forwarding a problematic email may not give so many clues because it has already been processed by all your company' systems.

Chris00 avatar Nov 23 '21 17:11 Chris00

I'm afraid we can't do much about this in mu/mu4e, but I still don't fully understand the problem, but I guess we've spent enough time on it. If this is still seen please attach (as a file, copy-pasting the text into github is not so useful) a problematic email, thanks!

djcb avatar Aug 04 '23 06:08 djcb