mail
mail copied to clipboard
text garbled for mail subject decoding
https://github.com/mikel/mail/blob/8fbb17d4d5364c77cc870769d451bc2739b3a8ce/lib/mail/encodings.rb#L123-L138
here is the original data of subject
Subject: =?iso-2022-jp?Q?=1B=24B0F7o=3EpJs=1B=28B=5B=2D01_=1B=24?=
=?iso-2022-jp?Q?BBg=3Cj=3Bq=3B=3A1=3FMQ2q=3CR=1B=28B_=2D_=1B=24B=3B?=
=?iso-2022-jp?Q?q=3B=3A1=3FMQ=24K=24=2B=24=2B=24k=1B=28BDWH=1B=24B=24?=
=?iso-2022-jp?Q?NFs=3C=213=2BH=2F6HL3=1B=28B=5D?=
when i use mail.subject
,it returns
"案件情報[-01 �BBg<j;q;:1?MQ2q<R - �q;:1?MQ$K$+$+$kDWH�NFs<!3+H/6HL3]"
the subject source code is seperate to multiple lines and seems to be processed seperatly according to Encodings.value_decode
,
but when I manually merged those into one line like this
Subject: =?iso-2022-jp?Q?=1B=24B0F7o=3EpJs=1B=28B=5B=2D01_=1B=24BBg=3Cj=3Bq=3B=3A1=3FMQ2q=3CR=1B=28B_=2D_=1B=24B=3Bq=3B=3A1=3FMQ=24K=24=2B=24=2B=24k=1B=28BDWH=1B=24B=24NFs=3C=213=2BH=2F6HL3=1B=28B=5D?=
it returns correct result
"案件情報[-01 大手資産運用会社 - 資産運用にかかるDWHの二次開発業務]"
it seems you need to concat those lines first and then decoding it in some case, or just I got a wrong formatted email?
https://github.com/mikel/mail/blob/8fbb17d4d5364c77cc870769d451bc2739b3a8ce/lib/mail/version_specific/ruby_1_9.rb#L124-L135
I think the force_encoding part shouldn't perform here, it should perform after the whole string were concatenated
The source code of mail
is a little too complicated for me to do the modification, so I implement RFC2047 myself to solve this issue in my project. I borrowed lots of tests from mail
and made my gem passed them all.
RFC2047 implementation is here: https://github.com/tonytonyjan/rfc_2047
Usage
$ gem install new_rfc_2047
$ ruby -rrfc_2047 -e 'puts Rfc2047.decode "Subject: =?iso-2022-jp?Q?=1B=24B0F7o=3EpJs=1B=28B=5B=2D01_=1B=24?=\\n =?iso-2022-jp?Q?BBg=3Cj=3Bq=3B=3A1=3FMQ2q=3CR=1B=28B_=2D_=1B=24B=3B?=\\n =?iso-2022-jp?Q?q=3B=3A1=3FMQ=24K=24=2B=24=2B=24k=1B=28BDWH=1B=24B=24?=\\n =?iso-2022-jp?Q?NFs=3C=213=2BH=2F6HL3=1B=28B=5D?="'
Subject: 案件情報[-01 大手資産運用会社 - 資産運用にかかるDWHの二次開発業務]
Below is how I integreate with my project:
inbound_mail.is_a?(Mail::Message) # => true
Rfc2047.decode_field inbound_mail['subject'].value
Hope it helps anyone who encountered the same issue.
it seems you need to concat those lines first and then decoding it in some case, or just I got a wrong formatted email?
I don't think we should concat them first. From RFC 2047, section 5:
The 'encoded-text' in an 'encoded-word' must be self-contained
Take 我愛妳
for example:
This is valid:
UTF-8 | RFC 2047 | |
---|---|---|
我 |
\xE6\x88\x91 |
=?utf-8?B?5oiR?= |
愛 |
\xE6\x84\x9B |
=?utf-8?B?5oSb?= |
妳 |
\xE5\xA6\xB3 |
=?utf-8?B?5aaz?= |
This is also valud:
UTF-8 | RFC 2047 | |
---|---|---|
我愛 |
\xE6\x88\x91\xE6\x84\x9B |
=?utf-8?B?5oiR5oSb?= |
妳 |
\xE5\xA6\xB3 |
=?utf-8?B?5aaz?= |
This is NOT valid:
UTF-8 | RFC 2047 | |
---|---|---|
我\xE6 |
\xE6\x88\x91\xE6 |
=?utf-8?B?5oiR5g==?= |
\x84\x9B妳 |
\x84\x9B\xE5\xA6\xB3 |
=?utf-8?B?hJvlprM=?= |
I think the mail
gem approaches correctly based on the spec, and this is an upstream issue belongs of mail servers in Japan or Taiwan.