bazel
bazel copied to clipboard
Add support for non-ASCII characters in workspace paths on Unix
- Convert server args to the internal string representation. The arguments for requests to the server were already converted to Bazel's internal string representation, which resulted in a mismatch between
--client_cwd
and--workspace_directory
if the workspace path contains non-ASCII characters. - Read the downloader config using Bazel's filesystem implementation.
- Make
MacOSXFsEventsDiffAwareness
UTF-8 aware. It previously used theGetStringUTF
JNI method, which, despite its name, doesn't return the UTF-8 representation of a string, but modified CESU-8. - Correctly reencode path strings for
LocalDiffAwareness
. - Correctly reencode the value of
user.dir
. - Correctly turn
ExecRequest
fields into strings forProcessBuilder
forbazel --batch run
. Also ensure that thebytes
fields are populated with UTF-8 on Windows, where the native client always treats them as UTF-8 instead of raw bytes (it defaults toCp1252
in CI). This makes it possible to reenable thetest_consistent_command_line_encoding
test, fixing #1775. Also add a TODO to explain planned follow-up work to enable full UTF-8 support forbazel run
arguments in a follow-up PR. - Finally get rid of the Latin-1 locale hack in the client (that is, replace it with forcing a UTF-8 locale if available). It doesn't work on macOS and Windows anyway and is unnecessary on Linux since the Unix filesystem implementation supports arbitrary byte sequences for paths anyway by going through native methods. This is required to prevent a very obscure crash: Caffeine caches trigger the JVM's
Logger
discovery, which in turn runs the static initializer ofFilePermission
, which in turn attempts to get ajava.nio.files.Path
for the value of theuser.dir
system property (i.e., the current working directory). But if the workspace path contains non-Latin-1 characters while the locale (and thussun.jnu.encoding
) is forced to Latin-1, this throws an exception due to unmappable characters in the path.
Along the way, optimized functions converting between Java strings and Bazel's internal string representation are added to replace ad-hoc conversion logic.
Since this change is already large enough (as required for a passing end-to-end test), changes to {Windows,JavaIo}FileSystem
are left to a follow-up PR.
Fixes #1775 Work towards #23859