aem-testing-clients icon indicating copy to clipboard operation
aem-testing-clients copied to clipboard

Multipart uploads do not properly handle UTF-8 encoded names

Open badvision opened this issue 3 years ago • 3 comments

https://github.com/adobe/aem-testing-clients/blob/889e1c6fc66ee146c5fa1b526065e36082fc480a/src/main/java/com/adobe/cq/testing/client/CQAssetsClient.java#L394

When building the multipart request, the charset of the filename part is not set as UTF-8 and instead is ISO-8859-1 and the filename characters that are 3-byte UTF-8 (such as Korean and Chinese glyphs) are squashed to ? characters.

badvision avatar Sep 28 '22 16:09 badvision

Strings in Java are UTF-8; are you sure that this squashing is not done much earlier? For example if you are on a windows platform and provide the filename via command line.

joerghoh avatar Sep 28 '22 16:09 joerghoh

Yes, I'm sure. It's happening in a CentOS Jenkins environment and on my Mac.

On Wed, Sep 28, 2022, 11:43 AM Jörg Hoh @.***> wrote:

Strings in Java are UTF-8; are you sure that this squashing is not done much earlier? For example if you are on a windows platform and provide the filename via command line.

— Reply to this email directly, view it on GitHub https://github.com/adobe/aem-testing-clients/issues/76#issuecomment-1261174062, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHH5BXAILAKRBQMMGJDT6DWARYS3ANCNFSM6AAAAAAQX6NHOE . You are receiving this because you authored the thread.Message ID: @.***>

badvision avatar Sep 28 '22 16:09 badvision

Specifically, the content type for the multipart of filename indicates it is text with ISO-1189-1 encoding, even though the request itself has UTF-8 encoding specified. Most tests don't pick up on this encoding snafu because lower ascii is the same in both encodings. Only multi-byte characters in the filename reveal this issue.

badvision avatar Sep 28 '22 17:09 badvision