graal icon indicating copy to clipboard operation
graal copied to clipboard

[GR-60338] [Native Image] `sun.jnu.encoding` defaults to legacy code page on Windows

Open fmeum opened this issue 1 year ago • 3 comments

Describe the Issue

The Charset used for filesystem operations (e.g. to encode/decode paths) is determined by the sun.jnu.encoding system property, which is inherited from the build time environment. On Windows, this defaults to the current code page (as determined by a call to GetACP()), which usually results in a legacy encoding such as Cp1252 that doesn't support most Unicode characters. The only way to work around this is to force the system code page to be UTF-8, which is still considered a beta feature and can break other apps.

Using the latest version of GraalVM can resolve many issues.

GraalVM Version

Oracle GraalVM 23.0.1+11.1

Operating System and Version

Windows 10 x86

Build Command

Any image build

Expected Behavior

System.getProperty("sun.jnu.encoding") returns UTF-8 by default or there is a build-time option to choose this encoding.

Actual Behavior

System.getProperty("sun.jnu.encoding") returns Cp1252 and operating on filesystem paths with non-ASCII Unicode characters results in InvalidPathExceptions.

Steps to Reproduce

  1. Print System.getProperty("sun.jnu.encoding") from any image build on Windows with default settings.

Additional Context

Individual applications can opt into a UTF-8 code page: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#set-a-process-code-page-to-utf-8

It may be sufficient to add such a manifest to the native-image.exe binary.

Build Log Output and Error Messages

No response

fmeum avatar Dec 05 '24 09:12 fmeum

Hi @fmeum, Thank you for reaching out to us! We'll take a look into this and I'll make sure to keep you updated.

selhagani avatar Dec 05 '24 15:12 selhagani

As a temporary workaround, you can set -J-Dsun.jnu.encoding=Cp1252 during native image build.

antonwiens avatar May 22 '25 14:05 antonwiens

We instead build on a Windows machine that uses the UTF-8 locale by default (https://github.com/bazelbuild/bazel/blob/3e1922ce656252edde79f39a53e926da4222cefe/src/upload_all_java_tools.sh#L51) and patch a Windows app manifest into the resulting native image to force the use of UTF-8 (https://github.com/bazelbuild/bazel/blob/3e1922ce656252edde79f39a53e926da4222cefe/src/java_tools/buildjar/java/com/google/devtools/build/java/turbine/BUILD#L132).

It would still be very convenient if Graal could do this automatically though.

fmeum avatar May 22 '25 15:05 fmeum

We have to respect whatever the host system is using. I think it would be quite confusing if you set some locale up in your system and GraalVM then used something else.

wirthi avatar Aug 18 '25 11:08 wirthi