bazel icon indicating copy to clipboard operation
bazel copied to clipboard

regression in 20250526.2 for unpacking non-ascii file names

Open hanwen-flow opened this issue 6 months ago • 15 comments

Description of the bug:

admin@ip-172-31-7-56:~/vc/ef2$ rm -rf ~/.cache/bazel*
admin@ip-172-31-7-56:~/vc/ef2$ bazelisk build -k $REDACTED/...
2025/06/11 09:49:36 Downloading https://releases.bazel.build/9.0.0/rolling/9.0.0-pre.20250526.2/bazel-9.0.0-pre.20250526.2-linux-x86_64...
2025/06/11 09:49:36 Skipping basic authentication for releases.bazel.build because no credentials found in /home/admin/.netrc
Downloading: 59 MB out of 59 MB (100%) 
Extracting Bazel installation...
Starting local Bazel server (9.0.0-pre.20250526.2) and connecting to it...
INFO: Repository aspect_rules_js+ instantiated at:
  <builtin>: in <toplevel>
Repository rule http_archive defined at:
  /home/admin/.cache/bazel/_bazel_admin/fd974fe89a796f9c35a0944ed09934ee/external/bazel_tools/tools/build_defs/repo/http.bzl:394:31: in <toplevel>
ERROR: /home/admin/.cache/bazel/_bazel_admin/fd974fe89a796f9c35a0944ed09934ee/external/bazel_tools/tools/build_defs/repo/http.bzl:139:45: An error occurred during the fetch of repository 'aspect_rules_js+':
   Traceback (most recent call last):
	File "/home/admin/.cache/bazel/_bazel_admin/fd974fe89a796f9c35a0944ed09934ee/external/bazel_tools/tools/build_defs/repo/http.bzl", line 139, column 45, in _http_archive_impl
		download_info = ctx.download_and_extract(
Error in download_and_extract: java.io.IOException: Error extracting /home/admin/.cache/bazel/_bazel_admin/fd974fe89a796f9c35a0944ed09934ee/external/aspect_rules_js+/temp11524386281555661874/rules_js-v2.3.7.tar.gz to /home/admin/.cache/bazel/_bazel_admin/fd974fe89a796f9c35a0944ed09934ee/external/aspect_rules_js+/temp11524386281555661874: [unix_jni.cc:281] /home/admin/.cache/bazel/_bazel_admin/fd974fe89a796f9c35a0944ed09934ee/external/aspect_rules_js+/js/private/test/image/non_ascii/empty empty.?? (No such file or directory)
WARNING: errors encountered while analyzing target '//infra/bento_runner/insert:insert_lib', it will not be built.
no such package '@@aspect_rules_js+//npm': java.io.IOException: Error extracting /home/admin/.cache/bazel/_bazel_admin/fd974fe89a796f9c35a0944ed09934ee/external/aspect_rules_js+/temp11524386281555661874/rules_js-v2.3.7.tar.gz to /home/admin/.cache/bazel/_bazel_admin/fd974fe89a796f9c35a0944ed09934ee/external/aspect_rules_js+/temp11524386281555661874: [unix_jni.cc:281] /home/admin/.cache/bazel/_bazel_admin/fd974fe89a796f9c35a0944ed09934ee/external/aspect_rules_js+/js/private/test/image/non_ascii/empty empty.?? (No such file or directory)
WARNING: errors encountered while analyzing target 'REDACTED'
admin@ip-172-31-7-56:~/vc/ef2$ cat .bazelversion 
9.0.0-pre.20250526.2
admin@ip-172-31-7-56:~/vc/ef2$ vi .bazelversion 
admin@ip-172-31-7-56:~/vc/ef2$ bazelisk build -k REDACTED/...
2025/06/11 09:52:33 Downloading https://releases.bazel.build/9.0.0/rolling/9.0.0-pre.20250516.2/bazel-9.0.0-pre.20250516.2-linux-x86_64...
2025/06/11 09:52:33 Skipping basic authentication for releases.bazel.build because no credentials found in /home/admin/.netrc
Downloading: 59 MB out of 59 MB (100%) 
Extracting Bazel installation...
Starting local Bazel server (9.0.0-pre.20250516.2) and connecting to it...
INFO: Analyzed 9 targets (369 packages loaded, 13407 targets and 37 aspects configured).
INFO: Found 9 targets...
INFO: Elapsed time: 219.457s, Critical Path: 70.04s
INFO: 989 processes: 263 internal, 726 linux-sandbox.
INFO: Build completed successfully, 989 total actions
admin@ip-172-31-7-56:~/vc/ef2$  

Which category does this issue belong to?

regression

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No full repro yet; the last release appears to work normally on dev workstations. The above is from an AWS VM running Debian 12, which I occasionally use.

Which operating system are you running Bazel on?

Linux

If this is a regression, please try to identify the Bazel commit where the bug was introduced with bazelisk --bisect.

I'm sorry; I don't have time for this right now.

Any other information, logs, or outputs that you want to share?

Maybe some kind of encoding issue determined by my locale settings?

$$ set|grep -v CRED|grep '^[A-Z]'
BASH=/bin/bash
BASHOPTS=checkwinsize:cmdhist:complete_fullquote:expand_aliases:extglob:extquote:force_fignore:globasciiranges:globskipdots:histappend:interactive_comments:login_shell:patsub_replacement:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=([0]="0")
BASH_ARGV=()
BASH_CMDS=()
BASH_COMPLETION_VERSINFO=([0]="2" [1]="11")
BASH_LINENO=()
BASH_LOADABLES_PATH=/usr/local/lib/bash:/usr/lib/bash:/opt/local/lib/bash:/usr/pkg/lib/bash:/opt/pkg/lib/bash:.
BASH_REMATCH=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="5" [1]="2" [2]="15" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
BASH_VERSION='5.2.15(1)-release'
COLUMNS=119
COMP_WORDBREAKS=$' \t\n"\'><=;|&(:'
DIRSTACK=()
EUID=1000
GROUPS=()
HISTCONTROL=ignoreboth
HISTFILE=/home/admin/.bash_history
HISTFILESIZE=2000
HISTSIZE=1000
HOME=/home/admin
HOSTNAME=ip-172-31-7-56
HOSTTYPE=x86_64
IFS=$' \t\n'
LANG=C.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_IDENTIFICATION=de_DE.UTF-8
LC_MEASUREMENT=de_DE.UTF-8
LC_MONETARY=de_DE.UTF-8
LC_NAME=de_DE.UTF-8
LC_NUMERIC=de_DE.UTF-8
LC_PAPER=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_TIME=de_DE.UTF-8
LINES=36
LOGNAME=admin
LS_COLORS='rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.avif=01;35:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:*~=00;90:*#=00;90:*.bak=00;90:*.old=00;90:*.orig=00;90:*.part=00;90:*.rej=00;90:*.swp=00;90:*.tmp=00;90:*.dpkg-dist=00;90:*.dpkg-old=00;90:*.ucf-dist=00;90:*.ucf-new=00;90:*.ucf-old=00;90:*.rpmnew=00;90:*.rpmorig=00;90:*.rpmsave=00;90:'
MACHTYPE=x86_64-pc-linux-gnu
MAILCHECK=60
MOTD_SHOWN=pam
OLDPWD=/home/admin/vc
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
PIPESTATUS=([0]="0" [1]="0")
PPID=300743
PS1='\[\e]0;\u@\h: \w\a\]${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '
PS2='> '
PS4='+ '
PWD=/home/admin/vc/ef2
SHELL=/bin/bash
SHELLOPTS=braceexpand:emacs:hashall:histexpand:history:interactive-comments:monitor
SHLVL=1
SSH_CLIENT='62.216.209.72 12456 22'
SSH_CONNECTION='62.216.209.72 12456 172.31.7.56 22'
SSH_TTY=/dev/pts/0
TERM=xterm-256color
UID=1000
USER=admin
XDG_RUNTIME_DIR=/run/user/1000
XDG_SESSION_CLASS=user
XDG_SESSION_ID=19
XDG_SESSION_TYPE=tty

hanwen-flow avatar Jun 11 '25 09:06 hanwen-flow

I saw a similar failure with the go toolchain, which has a similarly oddly named file.

hanwen-flow avatar Jun 11 '25 10:06 hanwen-flow

What are the exact bytes in this filename? In particular, is it valid UTF-8? Which locale is the Bazel server running under?

(I suspect this might be https://github.com/bazelbuild/bazel/commit/9d83d95336721d2dd27a10dba03ed564155eb9d1 but I'd like to understand why this happens before rolling back.)

tjgq avatar Jun 11 '25 10:06 tjgq

@tjgq https://github.com/aspect-build/rules_js/tree/main/js/private/test/image/non_ascii

If this is valid UTF-8 we should be able to get this to work without a rollback.

fmeum avatar Jun 11 '25 10:06 fmeum

the Go SDK one was /home/admin/.cache/bazel/_bazel_admin/7779369af6aa65e0252b23aaeea35f7a/external/rules_go++go_sdk+go_sdk/test/fixedbugs/issue27836.dir/�foo.go which appears on my console as

$ ls -l ./test/fixedbugs/issue27836.dir
total 8
-rw-rw-r-- 1 hanwen hanwen 352 Sep 24  2024 Þfoo.go
-rw-rw-r-- 1 hanwen hanwen 363 Sep 24  2024 Þmain.go

$ tar tvfz ~/Downloads/go1.24.4.linux-amd64.tar.gz | grep 27836.*foo  | od -xc
0000000    722d    2d77    2d72    722d    2d2d    3020    302f    2020
          -   r   w   -   r   -   -   r   -   -       0   /   0        
0000020    2020    2020    2020    2020    2020    3320    3235    3220
                                                      3   5   2       2
0000040    3230    2d35    3530    322d    2039    3132    333a    2037
          0   2   5   -   0   5   -   2   9       2   1   :   3   7    
0000060    6f67    742f    7365    2f74    6966    6578    6264    6775
          g   o   /   t   e   s   t   /   f   i   x   e   d   b   u   g
0000100    2f73    7369    7573    3265    3837    3633    642e    7269
          s   /   i   s   s   u   e   2   7   8   3   6   .   d   i   r
0000120    c32f    669e    6f6f    672e    0a6f
          / 303 236   f   o   o   .   g   o  \n
0000132

hanwen-flow avatar Jun 11 '25 10:06 hanwen-flow

My dev workstation has LANG=en_US.UTF8; I tried setting that on the AWS VM too, but didn't make a difference.

Which locale is the Bazel server running under?

if it is not $LANG, how do I determine this?

(note: I edited the top comment to provide more info)

hanwen-flow avatar Jun 11 '25 10:06 hanwen-flow

Could you share the output of bazel info character-encoding and locale -a?

fmeum avatar Jun 11 '25 10:06 fmeum

admin@ip-172-31-7-56:~/vc/engflow$ locale -a
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_COLLATE to default locale: No such file or directory
C
C.utf8
POSIX
admin@ip-172-31-7-56:~/vc/engflow$ bazel info character-encoding
file.encoding = ISO-8859-1, defaultCharset = ISO-8859-1, sun.jnu.encoding = ANSI_X3.4-1968

hanwen-flow avatar Jun 11 '25 10:06 hanwen-flow

sun.jnu.encoding = ANSI_X3.4-1968

This is the problem. Bazel tries to set this to en_US.ISO-8859-1 if available, but otherwise shouldn't touch it. Since your default locale is determined by C.UTF-8, that should really be what Bazel ends up using instead.

But locale prints errors since you request de_DE.UTF-8 for some aspects of locales without that locale being installed. Perhaps that messes up locale detection within Bazel?

Could you also post the output of locale charmap?

fmeum avatar Jun 11 '25 11:06 fmeum

https://github.com/bazelbuild/bazel/pull/26261 reproduces the error.

fmeum avatar Jun 11 '25 11:06 fmeum

$ locale charmap
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968

hanwen-flow avatar Jun 11 '25 11:06 hanwen-flow

$ locale charmap
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968

While Bazel could do a better job at forcing a UTF-8 locale, this does look like a setup error. This does print UTF-8 on default Debian, Ubuntu and macOS installations. If it doesn't, many programs (not just Bazel) will run into encoding issues.

fmeum avatar Jun 11 '25 11:06 fmeum

While Bazel could do a better job at forcing a UTF-8 locale, this does look like a setup error. This does print UTF-8 on default Debian, Ubuntu and macOS installations. If it doesn't, many programs (not just Bazel) will run into encoding issues.

It would be helpful if bazel printed this in an error message.

IIRC this is running the image for our remote execution setup, and it's probably not designed for logging in and doing user-things.

(I have no idea how to fix this; I can find out, but if you have quick hint, that would be cool).

Thanks for the rapid feedback!

hanwen-flow avatar Jun 11 '25 11:06 hanwen-flow

admin@ip-172-31 sudo dpkg-reconfigure locales
$ export LC_ALL=en_US.UTF-8
admin@ip-172-31-7-56:~/vc/engflow$ bazel shutdown
admin@ip-172-31-7-56:~/vc/engflow$ bazel info character-encoding
Starting local Bazel server (9.0.0-pre.20250526.2) and connecting to it...
file.encoding = ISO-8859-1, defaultCharset = ISO-8859-1, sun.jnu.encoding = UTF-8

hanwen-flow avatar Jun 11 '25 12:06 hanwen-flow

@tjgq What do you think of showing a warning when the configured locale is ASCII only?

@hanwen-flow That looks good and should avoid these errors. I would recommend unsetting all locale variables except for an explicit LC_CTYPE=C.UTF-8. That should give you the most portable environment.

fmeum avatar Jun 11 '25 12:06 fmeum

@tjgq What do you think of showing a warning when the configured locale is ASCII only?

Sounds good - would you like to send a ~~CL~~ PR?

tjgq avatar Jun 11 '25 12:06 tjgq

A fix for this issue has been included in Bazel 8.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=8.4.0rc1. Thanks!

iancha1992 avatar Aug 21 '25 19:08 iancha1992