alot icon indicating copy to clipboard operation
alot copied to clipboard

Add test case for encoding problems

Open teto opened this issue 6 years ago • 8 comments

Some of my Japanese mails are not displayed correctly https://github.com/pazz/alot/issues/1314

For instance the text from tests/static/mail/japanese.eml appear on my terminal as:

MA-EYES?$B$4MxMQ<T3F0L?(B

BIRD-BO?$B$N2OOB$G$9!#$*Hh$lMM$G$9!#?(B

my terminal is perfectly capable of displaying kanjis, alot even shows the subject correctly, just the body is messed up.

It would be great if someone could pick this up. When I launched the tests, it failed on some other test (using python3.7)

======================================================================
FAIL: test_env_set (tests.helper_test.TestCallCmdAsync)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/teto/alot/tests/utilities.py", line 188, in _actual
    return loop.run_until_complete(coro(*args, **kwargs))
  File "/nix/store/ydk0mfpvn9smcmn72wc9i20slv1d2b79-python3-3.7.2/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "/home/teto/alot/tests/helper_test.py", line 424, in test_env_set
    self.assertEqual(ret[0], 'bar')
AssertionError: '' != 'bar'
+ bar

teto avatar Jan 16 '19 03:01 teto

Are you sure the test file you add is in valid email format according to the RFC? file tests/static/mail/japanese.eml --mime tells me it is message/rfc822; charset=utf-8 but the header claims it is encoded in 7bit and the charset is ISO-2022-JP. That sounds wrong to me.

lucc avatar Jan 21 '19 10:01 lucc

I looked at the original mail files in my maildir directory and they are indeed: message/rfc822; charset=us-ascii. I've tried to save the test back to us-ascii in vim via :set fileencoding=us-ascii or ``` iconv iconv -f US-ASCII -t UTF-8

teto avatar Jan 22 '19 11:01 teto

here is another case where it doesn't work with a different kind of output bad_mail.txt with the edited screenshot. 2019-02-06-145519_636x757_scrot

do note that some japanese messages open up correclty.

teto avatar Feb 06 '19 06:02 teto

Regarding the mimetype of japanese.eml, it might be because I saved the mail with a custom hook:

async def save_mail(ui): 
    # inspired by https://github.com/pazz/alot/issues/1310
    # get msg content 
    msg = ui.current_buffer.get_selected_message()
    # this is an alot.db.Message
    eml = msg.get_email() # this is an email.Message
    # open file and write str(eml)..
    with tempfile.NamedTemporaryFile(mode='w+', prefix="alot-", delete=False) as out:
        out.write(str(eml))

        ui.notify("saved to %s" % out.name, priority='normal', timeout=15)

Neomutt displays the mail correctly but I much prefer alot. Really looking forward to a solution for this. I would like to add hooks to translate the mails etc.

teto avatar Feb 13 '19 11:02 teto

I am sorry for my laziness regarding the whole encoding and display topic. I hope to have more time available for this starting next week.

lucc avatar Feb 14 '19 06:02 lucc

Sorry if that sounded entitled. Take your time it's open source we should enjoy it no problem :) I am available on IRC should you have any question.

teto avatar Feb 14 '19 07:02 teto

how am I supposed to learn Japanese if I can't read my mails :p

teto avatar Jun 10 '19 12:06 teto

I have updated the test according to what our other tests do and indeed it fails:

======================================================================
FAIL: test_simple_japanese_file (tests.db.test_utils.TestExtractBody)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/pazz/projects/alot/tests/db/test_utils.py", line 722, in test_simple_japanese_file
    self.assertEqual(actual, expected)
AssertionError: 'MA-EYES������������������\n\nBIRD-BO���������������������������������������\n' != '\n            MA-EYESご利用者各位\n\n            BIRD-BOの河和です。お疲れ様です。\n        '
- MA-EYES������������������
  
- BIRD-BO���������������������������������������
+             MA-EYESご利用者各位
+ 
+             BIRD-BOの河和です。お疲れ様です。
+       

To me it looks like this file is utf8 encoded, but the header explicitly announces the content as ISO-2022-JP, so it will not be interpreted as expected. This all looks just as I would expect as the mail simply is malformed. We can discuss whether alot should be lenient in these cases but I'd rather stay in line with what python email module does..

pazz avatar Nov 23 '19 09:11 pazz