simple-java-mail icon indicating copy to clipboard operation
simple-java-mail copied to clipboard

Long non-ascii attachment names cannot be decoded by outlook

Open alexegorenkov opened this issue 3 years ago • 2 comments

Email attachments with long names that contain unicode characters seem to be incorrectly encoded.

We are using simple-java-mail version 5.5.1. I have not tried to reproduce on latest.

I've included a reproduction below.

Potentially a duplicate of https://github.com/bbottema/simple-java-mail/issues/293, which is missing a reproduction.

The input "재빠 게으갈색 여 우가 통나 REPORT (위로뛰어 올랐).pdf" will be encoded in such a way that Outlook will not recognize the PDF file nor will the filename be recognized by a user. There is an example of the output below.

A simple reproduction looks like this:

byte[] bytes = ByteBuffers.toBytes(ByteBuffer.wrap(new byte[] {1, 2, 3}));
EmailPopulatingBuilder simpleJavaMail = EmailBuilder.startingBlank()
        .withAttachment(
                "재빠 게으갈색 여 우가 통나 REPORT (위로뛰어 올랐).pdf",
                bytes,
                MimeType.of("application/pdf").get())
        .buildEmail();

The encoded name comes back in quoted printable form, but Outlook does not decode this the way we would expect.

simpleJavaMail.getAttachments().get(0).getName()

will produce

=?UTF-8?Q?=EC=9E=AC=EB=B9=A0_=EA=B2=8C?= =?UTF-8?Q?=EC=9C=BC=EA=B0=88=EC=83=89_=EC=97=AC?= =?UTF-8?Q?_=EC=9A=B0=EA=B0=80_=ED=86=B5=EB=82=98_RE?= =?UTF-8?Q?PORT_(=EC=9C=84=EB=A1=9C=EB=9B=B0?= =?UTF-8?Q?=EC=96=B4_=EC=98=AC=EB=9E=90).pdf?=

The attachment in outlook will have the same indecipherable filename.

I've confirmed that there is a proper encoding that works with modern email clients such as Outlook, via the following steps.

  1. Create the file touch 재빠 게으갈색 여 우가 통나 REPORT (위로뛰어 올랐).pdf (Mac might reject it without using touch).
  2. Send the file as an attachment in an e-mail through Outlook or similar.
  3. View the .mime relate to the e-mail -- in Outlook this can be seen through "View Source".

I've explored the issue a little:

  • Changing the line length changes the encoding from base64(?UTF-8?B?) to quoted printable(?UTF-8?Q?)
    • "재빠 게으갈색 여 우가.pdf" generates =?UTF-8?B?7J6s67mgIOqyjOycvOqwiOyDiSDsl6wg7Jqw6rCALnBkZg==?=
  • Very long examples without Unicode/Hangul/Korean are fine.
  • A single unicode character in a long name is enough to change the encoding format.

alexegorenkov avatar Oct 07 '21 23:10 alexegorenkov

What happens if you use @dschrul-cf's suggested workaround?

You can use this to fix it: .withAttachment(MimeUtility.encodeText(filename, "UTF-8", null), FileDataSource(pdf))

bbottema avatar Dec 25 '21 12:12 bbottema

Any update here?

bbottema avatar Jul 29 '22 17:07 bbottema

@alexegorenkov?

bbottema avatar Feb 20 '23 16:02 bbottema

Closed as inactionable

bbottema avatar Jun 23 '23 12:06 bbottema