alot
alot copied to clipboard
Add test case for encoding problems
Some of my Japanese mails are not displayed correctly https://github.com/pazz/alot/issues/1314
For instance the text from tests/static/mail/japanese.eml appear on my terminal as:
MA-EYES?$B$4MxMQ<T3F0L?(B
BIRD-BO?$B$N2OOB$G$9!#$*Hh$lMM$G$9!#?(B
my terminal is perfectly capable of displaying kanjis, alot even shows the subject correctly, just the body is messed up.
It would be great if someone could pick this up. When I launched the tests, it failed on some other test (using python3.7)
======================================================================
FAIL: test_env_set (tests.helper_test.TestCallCmdAsync)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/teto/alot/tests/utilities.py", line 188, in _actual
return loop.run_until_complete(coro(*args, **kwargs))
File "/nix/store/ydk0mfpvn9smcmn72wc9i20slv1d2b79-python3-3.7.2/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
return future.result()
File "/home/teto/alot/tests/helper_test.py", line 424, in test_env_set
self.assertEqual(ret[0], 'bar')
AssertionError: '' != 'bar'
+ bar
Are you sure the test file you add is in valid email format according to the RFC? file tests/static/mail/japanese.eml --mime
tells me it is message/rfc822; charset=utf-8
but the header claims it is encoded in 7bit
and the charset is ISO-2022-JP
. That sounds wrong to me.
I looked at the original mail files in my maildir directory and they are indeed: message/rfc822; charset=us-ascii
. I've tried to save the test back to us-ascii in vim via :set fileencoding=us-ascii
or ```
iconv
iconv -f US-ASCII -t UTF-8
here is another case where it doesn't work with a different kind of output
bad_mail.txt
with the edited screenshot.
do note that some japanese messages open up correclty.
Regarding the mimetype of japanese.eml, it might be because I saved the mail with a custom hook:
async def save_mail(ui):
# inspired by https://github.com/pazz/alot/issues/1310
# get msg content
msg = ui.current_buffer.get_selected_message()
# this is an alot.db.Message
eml = msg.get_email() # this is an email.Message
# open file and write str(eml)..
with tempfile.NamedTemporaryFile(mode='w+', prefix="alot-", delete=False) as out:
out.write(str(eml))
ui.notify("saved to %s" % out.name, priority='normal', timeout=15)
Neomutt displays the mail correctly but I much prefer alot. Really looking forward to a solution for this. I would like to add hooks to translate the mails etc.
I am sorry for my laziness regarding the whole encoding and display topic. I hope to have more time available for this starting next week.
Sorry if that sounded entitled. Take your time it's open source we should enjoy it no problem :) I am available on IRC should you have any question.
how am I supposed to learn Japanese if I can't read my mails :p
I have updated the test according to what our other tests do and indeed it fails:
======================================================================
FAIL: test_simple_japanese_file (tests.db.test_utils.TestExtractBody)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/pazz/projects/alot/tests/db/test_utils.py", line 722, in test_simple_japanese_file
self.assertEqual(actual, expected)
AssertionError: 'MA-EYES������������������\n\nBIRD-BO���������������������������������������\n' != '\n MA-EYESご利用者各位\n\n BIRD-BOの河和です。お疲れ様です。\n '
- MA-EYES������������������
- BIRD-BO���������������������������������������
+ MA-EYESご利用者各位
+
+ BIRD-BOの河和です。お疲れ様です。
+
To me it looks like this file is utf8 encoded, but the header explicitly announces the content as ISO-2022-JP
, so it will not be interpreted as expected.
This all looks just as I would expect as the mail simply is malformed.
We can discuss whether alot should be lenient in these cases but I'd rather stay in line with what python email module does..