mail icon indicating copy to clipboard operation
mail copied to clipboard

text garbled for mail subject decoding

Open discdisk opened this issue 4 years ago • 3 comments

https://github.com/mikel/mail/blob/8fbb17d4d5364c77cc870769d451bc2739b3a8ce/lib/mail/encodings.rb#L123-L138

here is the original data of subject

Subject: =?iso-2022-jp?Q?=1B=24B0F7o=3EpJs=1B=28B=5B=2D01_=1B=24?=
 =?iso-2022-jp?Q?BBg=3Cj=3Bq=3B=3A1=3FMQ2q=3CR=1B=28B_=2D_=1B=24B=3B?=
 =?iso-2022-jp?Q?q=3B=3A1=3FMQ=24K=24=2B=24=2B=24k=1B=28BDWH=1B=24B=24?=
 =?iso-2022-jp?Q?NFs=3C=213=2BH=2F6HL3=1B=28B=5D?=

when i use mail.subject,it returns "案件情報[-01 �BBg<j;q;:1?MQ2q<R - �q;:1?MQ$K$+$+$kDWH�NFs<!3+H/6HL3]"

the subject source code is seperate to multiple lines and seems to be processed seperatly according to Encodings.value_decode,

but when I manually merged those into one line like this

Subject: =?iso-2022-jp?Q?=1B=24B0F7o=3EpJs=1B=28B=5B=2D01_=1B=24BBg=3Cj=3Bq=3B=3A1=3FMQ2q=3CR=1B=28B_=2D_=1B=24B=3Bq=3B=3A1=3FMQ=24K=24=2B=24=2B=24k=1B=28BDWH=1B=24B=24NFs=3C=213=2BH=2F6HL3=1B=28B=5D?=

it returns correct result "案件情報[-01 大手資産運用会社 - 資産運用にかかるDWHの二次開発業務]"

it seems you need to concat those lines first and then decoding it in some case, or just I got a wrong formatted email?

discdisk avatar Jun 17 '20 04:06 discdisk

https://github.com/mikel/mail/blob/8fbb17d4d5364c77cc870769d451bc2739b3a8ce/lib/mail/version_specific/ruby_1_9.rb#L124-L135

I think the force_encoding part shouldn't perform here, it should perform after the whole string were concatenated

discdisk avatar Jun 18 '20 03:06 discdisk

The source code of mail is a little too complicated for me to do the modification, so I implement RFC2047 myself to solve this issue in my project. I borrowed lots of tests from mail and made my gem passed them all.

RFC2047 implementation is here: https://github.com/tonytonyjan/rfc_2047

Usage

$ gem install new_rfc_2047
$ ruby -rrfc_2047 -e 'puts Rfc2047.decode "Subject: =?iso-2022-jp?Q?=1B=24B0F7o=3EpJs=1B=28B=5B=2D01_=1B=24?=\\n =?iso-2022-jp?Q?BBg=3Cj=3Bq=3B=3A1=3FMQ2q=3CR=1B=28B_=2D_=1B=24B=3B?=\\n =?iso-2022-jp?Q?q=3B=3A1=3FMQ=24K=24=2B=24=2B=24k=1B=28BDWH=1B=24B=24?=\\n =?iso-2022-jp?Q?NFs=3C=213=2BH=2F6HL3=1B=28B=5D?="'
Subject: 案件情報[-01 大手資産運用会社 - 資産運用にかかるDWHの二次開発業務]

Below is how I integreate with my project:

inbound_mail.is_a?(Mail::Message) # => true
Rfc2047.decode_field inbound_mail['subject'].value

Hope it helps anyone who encountered the same issue.

tonytonyjan avatar Oct 18 '20 16:10 tonytonyjan

it seems you need to concat those lines first and then decoding it in some case, or just I got a wrong formatted email?

I don't think we should concat them first. From RFC 2047, section 5:

The 'encoded-text' in an 'encoded-word' must be self-contained

Take 我愛妳 for example:

This is valid:

UTF-8 RFC 2047
\xE6\x88\x91 =?utf-8?B?5oiR?=
\xE6\x84\x9B =?utf-8?B?5oSb?=
\xE5\xA6\xB3 =?utf-8?B?5aaz?=

This is also valud:

UTF-8 RFC 2047
我愛 \xE6\x88\x91\xE6\x84\x9B =?utf-8?B?5oiR5oSb?=
\xE5\xA6\xB3 =?utf-8?B?5aaz?=

This is NOT valid:

UTF-8 RFC 2047
我\xE6 \xE6\x88\x91\xE6 =?utf-8?B?5oiR5g==?=
\x84\x9B妳 \x84\x9B\xE5\xA6\xB3 =?utf-8?B?hJvlprM=?=

I think the mail gem approaches correctly based on the spec, and this is an upstream issue belongs of mail servers in Japan or Taiwan.

tonytonyjan avatar Jul 11 '21 06:07 tonytonyjan