go icon indicating copy to clipboard operation
go copied to clipboard

x/build/cmd/coordinator: write_go_bootstrap_tar failing with 404 on some arm builders

Open millerresearch opened this issue 1 year ago • 7 comments

Go version

gotip

Output of go env in your module/workspace:

n/a

What did you do?

Observed https://farmer.golang.org

What did you see happen?

[plan9-arm](https://github.com/golang/go/wiki/DashboardBuilders) rev [10ed134a](https://go-review.googlesource.com/#/q/10ed134afe1319403a9a6a8b6bb798f29e5a4d5e); [running](https://farmer.golang.org/temporarylogs?name=plan9-arm&rev=10ed134afe1319403a9a6a8b6bb798f29e5a4d5e&st=0xc016f27c00); http://pi4g reverse peer pi4g/88.97.27.83:60662 for host type host-plan9-arm-0intro, 12h30m43s ago
...
  2024-08-22T21:03:27Z finish_get_source after 0s; go@10ed134afe1319403a9a6a8b6bb798f29e5a4d5e
  2024-08-22T21:03:27Z write_go_src_tar 
  2024-08-22T21:03:27Z finish_write_go_bootstrap_tar after 676.1ms; err=500 Internal Server Error; body: writetgz: fetching provided URL "https://go.dev/dl/go1.22.6.plan9-arm-7.tar.gz": 404 Not Found

 +45017.1s (now)
[plan9-arm](https://github.com/golang/go/wiki/DashboardBuilders) rev [d2879efd](https://go-review.googlesource.com/#/q/d2879efd0227df32d6aeee1be58c325b477f22d4); [running](https://farmer.golang.org/temporarylogs?name=plan9-arm&rev=d2879efd0227df32d6aeee1be58c325b477f22d4&st=0xc0396f9500); http://pi4n reverse peer pi4n/88.97.27.83:51435 for host type host-plan9-arm-0intro, 12h29m58s ago
...
  2024-08-22T21:04:15Z finish_get_source after 0s; go@d2879efd0227df32d6aeee1be58c325b477f22d4
  2024-08-22T21:04:15Z write_go_src_tar 
  2024-08-22T21:04:15Z finish_write_go_bootstrap_tar after 615.5ms; err=500 Internal Server Error; body: writetgz: fetching provided URL "https://go.dev/dl/go1.22.6.plan9-arm-7.tar.gz": 404 Not Found

 +44968.7s (now)
[linux-mips-rtrk](https://github.com/golang/go/wiki/DashboardBuilders) rev [ea08952a](https://go-review.googlesource.com/#/q/ea08952aa2db17ce4c14d9f9cb0fab03380073a0); [running](https://farmer.golang.org/temporarylogs?name=linux-mips-rtrk&rev=ea08952aa2db17ce4c14d9f9cb0fab03380073a0&st=0xc02cc1d6c0); http://host-linux-mips64-rtrk reverse peer host-linux-mips64-rtrk/82.117.214.122:43586 for host type host-linux-mips64-rtrk, 8h39m58s ago
...
  2024-08-23T09:33:38Z run_test:go_test:cmd/link/internal/benchmark host-linux-mips64-rtrk
  2024-08-23T09:33:42Z finish_run_test:go_test:cmd/link/internal/benchmark after 3.53s; host-linux-mips64-rtrk
  2024-08-23T09:33:42Z run_test:go_test:cmd/link/internal/ld host-linux-mips64-rtrk
   +2.0s (now)
[linux-mipsle-rtrk](https://github.com/golang/go/wiki/DashboardBuilders) rev [ea08952a](https://go-review.googlesource.com/#/q/ea08952aa2db17ce4c14d9f9cb0fab03380073a0) (sub-repo net rev [4542a426](https://go-review.googlesource.com/#/q/4542a42604cd159f1adb93c58368079ae37b3bf6)); [running](https://farmer.golang.org/temporarylogs?name=linux-mipsle-rtrk&rev=ea08952aa2db17ce4c14d9f9cb0fab03380073a0&st=0xc040599180&subName=net&subRev=4542a42604cd159f1adb93c58368079ae37b3bf6); http://host-linux-mips64le-rtrk reverse peer host-linux-mips64le-rtrk/82.117.214.122:40052 for host type host-linux-mips64le-rtrk, 8h39m53s ago
...
  2024-08-23T09:28:44Z listing_subrepo_modules net
  2024-08-23T09:28:45Z finish_listing_subrepo_modules after 384.7ms; net
  2024-08-23T09:28:45Z running_subrepo_tests net
 +299.4s (now)
[netbsd-arm-bsiegert](https://github.com/golang/go/wiki/DashboardBuilders) rev [b2f3a427](https://go-review.googlesource.com/#/q/b2f3a427dd554874eab570d03297468d22f903b6); [running](https://farmer.golang.org/temporarylogs?name=netbsd-arm-bsiegert&rev=b2f3a427dd554874eab570d03297468d22f903b6&st=0xc042caefc0); http://ebi.bentsukun.ch reverse peer ebi.bentsukun.ch/81.221.220.50:54867 for host type host-netbsd-arm-bsiegert, 8m58s ago
...
  2024-08-23T09:32:50Z finish_get_source after 0s; go@b2f3a427dd554874eab570d03297468d22f903b6
  2024-08-23T09:32:50Z write_go_src_tar 
  2024-08-23T09:32:50Z finish_write_go_bootstrap_tar after 627.7ms; err=500 Internal Server Error; body: writetgz: fetching provided URL "https://go.dev/dl/go1.22.6.netbsd-arm-7.tar.gz": 404 Not Found

  +53.8s (now)
[openbsd-arm-jsing](https://github.com/golang/go/wiki/DashboardBuilders) rev [b2f3a427](https://go-review.googlesource.com/#/q/b2f3a427dd554874eab570d03297468d22f903b6) (sub-repo net rev [4542a426](https://go-review.googlesource.com/#/q/4542a42604cd159f1adb93c58368079ae37b3bf6)); [running](https://farmer.golang.org/temporarylogs?name=openbsd-arm-jsing&rev=b2f3a427dd554874eab570d03297468d22f903b6&st=0xc056ccb500&subName=net&subRev=4542a42604cd159f1adb93c58368079ae37b3bf6); http://gobuilder-arm.sing.id.au reverse peer gobuilder-arm.sing.id.au/206.83.113.114:32633 for host type host-openbsd-arm-joelsing, 8m13s ago
...
  2024-08-23T09:32:55Z finish_get_source after 0s; go@b2f3a427dd554874eab570d03297468d22f903b6
  2024-08-23T09:32:55Z write_go_src_tar 
  2024-08-23T09:32:56Z finish_write_go_bootstrap_tar after 1.24s; err=500 Internal Server Error; body: writetgz: fetching provided URL "https://go.dev/dl/go1.22.6.openbsd-arm-7.tar.gz": 404 Not Found

  +48.1s (now)

What did you expect to see?

It appears the build script is trying to fetch boostrap archives of the form go1.22.6.GOOS-arm-7.tar.gz when only go1.22.6.GOOS-arm.tar.gz exists (ie without the -7).

millerresearch avatar Aug 23 '24 12:08 millerresearch

cc @golang/release @dmitshur

cherrymui avatar Aug 23 '24 18:08 cherrymui

Change https://go.dev/cl/608076 mentions this issue: dashboard: leave out -7 suffix from go.dev/dl/ bootstrap URLs

gopherbot avatar Aug 23 '24 19:08 gopherbot

Thanks for reporting. This is a mistake in CL 520901, which didn't take into account the "-5" or "-7" suffixes in host config's HostArch. For GOOS != linux, the GOARCH = arm go.dev/dl/ archives are built with the cross-compilation default of GOARM=7, which is what the builders mentioned above are looking to download. Sent CL 608076.

Please note that the legacy build infrastructure isn't intended to be fully supported beyond the May 17, 2024 date (golang-dev thread), so we can only fix minor issues similar to this in order to help finish ongoing builder migrations to LUCI and give them more time. Thanks for for your work on migrating the remaining builders to LUCI.

dmitshur avatar Aug 23 '24 19:08 dmitshur

Please note that the legacy build infrastructure isn't intended to be fully supported beyond the May 17, 2024 date (golang-dev thread), so we can only fix minor issues similar to this in order to help finish ongoing builder migrations to LUCI and give them more time. Thanks for for your work on migrating the remaining builders to LUCI.

The plan9-arm LUCI builder is pretty stable now, with no repeatable failures and no more intermittent flakes than the legacy version. Do I need to do something formal to switch off the legacy plan9-arm builder or just stop running it?

millerresearch avatar Aug 24 '24 09:08 millerresearch

Looks like the fix wasn't sufficient. All plan9-arm builds are now failing like this:

Build log:
plan9-arm at 1fd8557249a9e8c04fbe7490483443ccc35dea50

:: Running /boot/workdir/go/src/make.rc with args ["/boot/workdir/go/src/make.rc" "-force"] and env ["home=/usr/glenda" "path=/boot/workdir/go1.4/go/bin\x00.\x00/bin" "type=host-plan9-arm-0intro" "GOARM=7" "GO_BUILD_KEY_DELETE_AFTER_READ=false" "GOTOOLCHAIN=local" "status=" "GO_TEST_TIMEOUT_SCALE=3" "fs=aoe" "GOCACHE=/boot/cache" "GOROOT_BOOTSTRAP=/boot/workdir/go1.4" "sysname=pi4n" "workdir=/boot/workdir" "objtype=arm" "*=aoe" "WORKDIR=/boot/workdir" "GO_BUILDER_NAME=plan9-arm" "GO_TEST_TIMEOUT_SCALE=3" "GOBIN=" "GOROOT_BOOTSTRAP="] in dir /boot/workdir/go/src

Building Go cmd/dist using . (go1.20 plan9/arm)
Building Go toolchain1 using /go1.4.
go tool dist: FAILED: /go1.4/bin/go install -tags=math_big_pure_go compiler_bootstrap purego bootstrap/cmd/...: fork/exec /go1.4/bin/go: '/go1.4' file does not exist


Error: build failed: make script failed: exit status: 'make.rc 199: dist 697: 2'

I don't know why it's trying to load the bootstrap from /go1.4 instead of /boot/workdir/go1.4

millerresearch avatar Aug 24 '24 10:08 millerresearch

The plan9-arm LUCI builder is pretty stable now, with no repeatable failures and no more intermittent flakes than the legacy version. Do I need to do something formal to switch off the legacy plan9-arm builder or just stop running it?

Indeed, that is great! Both plan9/arm and plan/386 LUCI builders look good to remove their known issue and consider them added. I sent CL 608155 to do that, and CL 607656 to mark them as migrated. (CC @0intro.)

When the coordinator is redeployed with the latter CL (next week), it'll stop sending work to the legacy plan9/arm builder. But given the equivalent LUCI builder is already providing good signal, I think it's fine for you to stop running it anytime. Thanks very much.


I don't know why it's trying to load the bootstrap from /go1.4 instead of /boot/workdir/go1.4

CL 606835 works around that go.dev/dl/ tarballs, where coordinator gets its go1.22.6 bootstrap toolchain from, have a top-level "go" directory by adding its bin directory to $PATH (or the equivalent path on Plan 9) and clears GOROOT_BOOTSTRAP. It is there in the log, "path=/boot/workdir/go1.4/go/bin\x00.\x00/bin" and "GOROOT_BOOTSTRAP=", but it seems not to work on Plan 9 as it did for other OSes. Maybe there's something different about that logic in make.rc vs make.bash. It seems to be finding some go1.20 bootstrap, but not printing its path, then falls back to a non-existing $home/go1.4 instead.

From my side, it might be easiest to adjust the plan9-arm buildlet to set $GOROOT_BOOTSTRAP to $WORKDIR/go1.4/go. From your side, you could try to place a go1.22.6 plan9/arm bootstrap in /go1.4 or something along those lines. Given the builder is about go go away as mentioned above, I don't think it's worth to do either. However, if there is a problem in the logic of finding a bootstrap in make.rc that you can spot, a fix to make it behave like the .bash version would be useful.

dmitshur avatar Aug 24 '24 18:08 dmitshur

I sent CL 608155 to [remove known issue], and CL 607656 to mark them as migrated [which stops coordinator from sending them work].

I haven't heard from you on those CLs, so I'll put them on hold for now. Whenever you're ready to take those next steps, please let me know and I'll rebase & submit them.

dmitshur avatar Sep 12 '24 20:09 dmitshur

Closing this again since the original problem with write_go_bootstrap_tar failing with 404 is resolved, the only currently connected legacy plan9 builder (plan9-arm) is working okay with a go1.22.6 bootstrap, so I don't think there's more left to do here.

dmitshur avatar Sep 12 '24 20:09 dmitshur

I don't know why it's trying to load the bootstrap from /go1.4 instead of /boot/workdir/go1.4

Now I know why. The make.rc script is trying to trying to check whether environment variable GOROOT_BOOTSTRAP is undefined, using this predicate:

    if(! ~ $#GOROOT_BOOTSTRAP 1){

In the Plan 9 rc shell, the expression $#VAR means the number of elements in the value of $VAR (which may be a list). In the build command sent to the builder, the GOROOT_BOOTSTRAP variable was not left undefined, but set to the empty string. So in this case, there's one element (the empty string), so the predicate is false.

A better predicate, which will be true if the variable is undefined, an empty list, or an empty string, would be:

    if(~ $"GOROOT_BOOTSTRAP ''){

millerresearch avatar Nov 14 '24 16:11 millerresearch

Change https://go.dev/cl/627944 mentions this issue: make.rc: correct test for undefined GOROOT_BOOTSTRAP

gopherbot avatar Nov 14 '24 16:11 gopherbot