graal icon indicating copy to clipboard operation
graal copied to clipboard

Accented characters issue when using java.nio.file.Path

Open Manokha opened this issue 1 year ago • 3 comments

Describe the issue Accented characters issue when using Path.of(...)

Steps to reproduce the issue

  1. Using this test sample (GraalVMAccentedTest.java):
import java.nio.file.Path;

public final class GraalVMAccentedTest {
    public static void main(String[] args) {
        var file = Path.of("àéïôù.txt").toFile();
        System.out.println("Could instantiate file " + file.getName());
    }
}
  1. Using eclipse-temurin java 17 (locally):
javac GraalVMAccentedTest.java
java GraalVMAccentedTest
# Outputs: "Could instantiate file àéïôù.txt"
  1. Run a GraalVM image:
docker run --name graalvm-accented-test -it "ghcr.io/graalvm/graalvm-ce:ol9-java17" /bin/bash
  1. Copy the file there (from another terminal):
docker cp GraalVMAccentedTest.java graalvm-accented-test:/app/GraalVMAccentedTest.java
  1. Compile and run (from the 3rd step terminal):
# Fails without -encoding utf8
javac -encoding utf8 GraalVMAccentedTest.java
java GraalVMAccentedTest
  1. Outputs:
Exception in thread "main" java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: ?????.txt
	at java.base/sun.nio.fs.UnixPath.encode(UnixPath.java:121)
	at java.base/sun.nio.fs.UnixPath.<init>(UnixPath.java:68)
	at java.base/sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:279)
	at java.base/java.nio.file.Path.of(Path.java:147)
	at GraalVMAccentedTest.main(GraalVMAccentedTest.java:5)

Describe GraalVM and your environment:

More details I tried with different bases (ol7, ol8), turns out ol7 works. And doesn't need the "-encoding utf-8" when compiling:

  • ghcr.io/graalvm/graalvm-ce:ol9-java17 and container-registry.oracle.com/graalvm/jdk:17-ol7 are failing
  • ghcr.io/graalvm/graalvm-ce:ol8-java17 and container-registry.oracle.com/graalvm/jdk:17-ol8 are failing
  • ghcr.io/graalvm/graalvm-ce:ol7-java17 and container-registry.oracle.com/graalvm/jdk:17-ol7 are OK!
docker run --name graalvm-accented-test -it "container-registry.oracle.com/graalvm/jdk:17-ol7" /bin/bash
# ...
bash-4.2# javac GraalVMAccentedTest.java 
bash-4.2# java GraalVMAccentedTest
Could instantiate file àéïôù.txt

Manokha avatar Apr 17 '24 12:04 Manokha

Thanks for reporting this. I will try to reproduce it on my side.

fernando-valdez avatar Apr 18 '24 02:04 fernando-valdez

@fernando-valdez Thanks :) Just updated the description (more details section) after trying different images, turns out it works with ol7 based images.

Manokha avatar Apr 18 '24 07:04 Manokha

@fernando-valdez Speaking with CentOS/RHEL on mind, regardless of what you describe as locales for the native image, the host system actually have to have langpacks installed. I hit the issue back in the day, wrote this doc:

    /**
     * This test in Native won't work on a barebone system,
     * just with C.UTF-8 default fallback locale.
     *
     * For example, this package satisfies the dependency on a RHEL 9 type of OS:
     * glibc-all-langpacks
     *
     */

So I'd suggest checking glibc langpacks on those ol base images....

Karm avatar Jun 25 '24 14:06 Karm

Thanks @Karm I can confirm that running:

microdnf --nobest install glibc-all-langpacks

fixes it. No more "-encoding utf-8" needed for compilation, nor exception when running.

Manokha avatar Jul 02 '24 15:07 Manokha

Thanks @Manokha for confirming. And thanks @Karm for your recommendation!

fernando-valdez avatar Jul 02 '24 15:07 fernando-valdez