bazel icon indicating copy to clipboard operation
bazel copied to clipboard

Add support for non-ASCII characters in workspace paths on Unix

Open fmeum opened this issue 4 months ago • 1 comments

  • Convert server args to the internal string representation. The arguments for requests to the server were already converted to Bazel's internal string representation, which resulted in a mismatch between --client_cwd and --workspace_directory if the workspace path contains non-ASCII characters.
  • Read the downloader config using Bazel's filesystem implementation.
  • Make MacOSXFsEventsDiffAwareness UTF-8 aware. It previously used the GetStringUTF JNI method, which, despite its name, doesn't return the UTF-8 representation of a string, but modified CESU-8.
  • Correctly reencode path strings for LocalDiffAwareness.
  • Correctly reencode the value of user.dir.
  • Correctly turn ExecRequest fields into strings for ProcessBuilder for bazel --batch run. Also ensure that the bytes fields are populated with UTF-8 on Windows, where the native client always treats them as UTF-8 instead of raw bytes (it defaults to Cp1252 in CI). This makes it possible to reenable the test_consistent_command_line_encoding test, fixing #1775. Also add a TODO to explain planned follow-up work to enable full UTF-8 support for bazel run arguments in a follow-up PR.
  • Finally get rid of the Latin-1 locale hack in the client (that is, replace it with forcing a UTF-8 locale if available). It doesn't work on macOS and Windows anyway and is unnecessary on Linux since the Unix filesystem implementation supports arbitrary byte sequences for paths anyway by going through native methods. This is required to prevent a very obscure crash: Caffeine caches trigger the JVM's Logger discovery, which in turn runs the static initializer of FilePermission, which in turn attempts to get a java.nio.files.Path for the value of the user.dir system property (i.e., the current working directory). But if the workspace path contains non-Latin-1 characters while the locale (and thus sun.jnu.encoding) is forced to Latin-1, this throws an exception due to unmappable characters in the path.

Along the way, optimized functions converting between Java strings and Bazel's internal string representation are added to replace ad-hoc conversion logic.

Since this change is already large enough (as required for a passing end-to-end test), changes to {Windows,JavaIo}FileSystem are left to a follow-up PR.

Fixes #1775 Work towards #23859

fmeum avatar Oct 16 '24 17:10 fmeum