simple-java-mail
simple-java-mail copied to clipboard
Long non-ascii attachment names cannot be decoded by outlook
Email attachments with long names that contain unicode characters seem to be incorrectly encoded.
We are using simple-java-mail version 5.5.1. I have not tried to reproduce on latest.
I've included a reproduction below.
Potentially a duplicate of https://github.com/bbottema/simple-java-mail/issues/293, which is missing a reproduction.
The input "재빠 게으갈색 여 우가 통나 REPORT (위로뛰어 올랐).pdf"
will be encoded in such a way that Outlook will not recognize the PDF file nor will the filename be recognized by a user. There is an example of the output below.
A simple reproduction looks like this:
byte[] bytes = ByteBuffers.toBytes(ByteBuffer.wrap(new byte[] {1, 2, 3}));
EmailPopulatingBuilder simpleJavaMail = EmailBuilder.startingBlank()
.withAttachment(
"재빠 게으갈색 여 우가 통나 REPORT (위로뛰어 올랐).pdf",
bytes,
MimeType.of("application/pdf").get())
.buildEmail();
The encoded name comes back in quoted printable form, but Outlook does not decode this the way we would expect.
simpleJavaMail.getAttachments().get(0).getName()
will produce
=?UTF-8?Q?=EC=9E=AC=EB=B9=A0_=EA=B2=8C?= =?UTF-8?Q?=EC=9C=BC=EA=B0=88=EC=83=89_=EC=97=AC?= =?UTF-8?Q?_=EC=9A=B0=EA=B0=80_=ED=86=B5=EB=82=98_RE?= =?UTF-8?Q?PORT_(=EC=9C=84=EB=A1=9C=EB=9B=B0?= =?UTF-8?Q?=EC=96=B4_=EC=98=AC=EB=9E=90).pdf?=
The attachment in outlook will have the same indecipherable filename.
I've confirmed that there is a proper encoding that works with modern email clients such as Outlook, via the following steps.
- Create the file
touch 재빠 게으갈색 여 우가 통나 REPORT (위로뛰어 올랐).pdf
(Mac might reject it without using touch). - Send the file as an attachment in an e-mail through Outlook or similar.
- View the .mime relate to the e-mail -- in Outlook this can be seen through "View Source".
I've explored the issue a little:
- Changing the line length changes the encoding from base64(?UTF-8?B?) to quoted printable(?UTF-8?Q?)
- "재빠 게으갈색 여 우가.pdf" generates
=?UTF-8?B?7J6s67mgIOqyjOycvOqwiOyDiSDsl6wg7Jqw6rCALnBkZg==?=
- "재빠 게으갈색 여 우가.pdf" generates
- Very long examples without Unicode/Hangul/Korean are fine.
- A single unicode character in a long name is enough to change the encoding format.
What happens if you use @dschrul-cf's suggested workaround?
You can use this to fix it:
.withAttachment(MimeUtility.encodeText(filename, "UTF-8", null), FileDataSource(pdf))
Any update here?
@alexegorenkov?
Closed as inactionable