[GR-60338] [Native Image] `sun.jnu.encoding` defaults to legacy code page on Windows
Describe the Issue
The Charset used for filesystem operations (e.g. to encode/decode paths) is determined by the sun.jnu.encoding system property, which is inherited from the build time environment. On Windows, this defaults to the current code page (as determined by a call to GetACP()), which usually results in a legacy encoding such as Cp1252 that doesn't support most Unicode characters. The only way to work around this is to force the system code page to be UTF-8, which is still considered a beta feature and can break other apps.
Using the latest version of GraalVM can resolve many issues.
- [X] I tried with the latest version of GraalVM.
GraalVM Version
Oracle GraalVM 23.0.1+11.1
Operating System and Version
Windows 10 x86
Build Command
Any image build
Expected Behavior
System.getProperty("sun.jnu.encoding") returns UTF-8 by default or there is a build-time option to choose this encoding.
Actual Behavior
System.getProperty("sun.jnu.encoding") returns Cp1252 and operating on filesystem paths with non-ASCII Unicode characters results in InvalidPathExceptions.
Steps to Reproduce
- Print
System.getProperty("sun.jnu.encoding")from any image build on Windows with default settings.
Additional Context
Individual applications can opt into a UTF-8 code page: https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#set-a-process-code-page-to-utf-8
It may be sufficient to add such a manifest to the native-image.exe binary.
Build Log Output and Error Messages
No response
Hi @fmeum, Thank you for reaching out to us! We'll take a look into this and I'll make sure to keep you updated.
As a temporary workaround, you can set -J-Dsun.jnu.encoding=Cp1252 during native image build.
We instead build on a Windows machine that uses the UTF-8 locale by default (https://github.com/bazelbuild/bazel/blob/3e1922ce656252edde79f39a53e926da4222cefe/src/upload_all_java_tools.sh#L51) and patch a Windows app manifest into the resulting native image to force the use of UTF-8 (https://github.com/bazelbuild/bazel/blob/3e1922ce656252edde79f39a53e926da4222cefe/src/java_tools/buildjar/java/com/google/devtools/build/java/turbine/BUILD#L132).
It would still be very convenient if Graal could do this automatically though.
We have to respect whatever the host system is using. I think it would be quite confusing if you set some locale up in your system and GraalVM then used something else.