go icon indicating copy to clipboard operation
go copied to clipboard

syscall: multi-arch build via qemu fails to exec go binary

Open bjohnso5 opened this issue 1 year ago • 13 comments

Go version

go version 1.23.0 linux/arm64

Output of go env in your module/workspace:

I'm unable to provide the output of `go env` as it fails with the same telemetry fork/exec error.

What did you do?

Our automated image build process fails to perform any step that invokes the go binary with the following error:

can't start telemetry child process: fork/exec /usr/local/go/bin/go: invalid argument

The Dockerfile is here, and is being built via a script that invokes docker buildx with multiple platforms, like:

docker buildx build --platform=linux/amd64,linux/arm64 --file 1.23/Dockerfile

It seems that there is something inherent in the qemu arm64 environment that renders go unable to fork itself to complete the telemetry setup. I'm fairly confident it's something specific to the 1.23 release as 1.22.6 builds successfully using the same setup today.

What did you see happen?

Failures to invoke any go command

What did you expect to see?

A successful install and configuration of go 1.23.0 in a multi-arch docker build.

bjohnso5 avatar Aug 20 '24 20:08 bjohnso5

I got the error when build arm64 image in amd host machine with buildx

docker buildx create --use --name=baker --driver docker-container  --platform=linux/amd64 --platform=linux/arm64 
docker buildx build --builder baker --platform=linux/amd64 --platform=linux/arm64  -t {tag} --push .

then I tried to run manually with docker run -it --rm --platform linux/arm64 {tag}

after unzip the command, I got the same error can't start telemetry child process: fork/exec /usr/local/go/bin/go: invalid argument, but when I exec the chmod a+x ${GOROOT}/bin/*, it works without any permission changes. However, after apply this command to Dockerfile,the error was not dealed

Dockerfile example:

FROM almalinux:9.4-20240530

ENV GOROOT=/usr/local/go \
    GOLANG_VERSION=1.23.0 \
    GOPATH=/go

ENV PATH=$GOPATH/bin:$PATH:$GOROOT/bin

RUN set -eox pipefail \
    && dnf install -y curl \
    && mkdir -p "${GOROOT}" "$GOPATH/src" "$GOPATH/bin" && chmod -R 1777 "$GOPATH" \
    && curl -sSL "https://go.dev/dl/go${GOLANG_VERSION}.linux-$(cat < /etc/arch).tar.gz" | tar -zxvC ${GOROOT} --strip-components=1 \
#    && chmod a+x ${GOROOT}/bin/* \
    && go version

WORKDIR $GOPATH

fearfate avatar Aug 21 '24 03:08 fearfate

With the circleci dockerfile, I get a segfault in gcc cc1 rather than something in go directly:

 > [linux/arm64 5/5] RUN	GO install "golang.org/x/vuln/cmd/[email protected]" && go clean -cache -modcache && rm -rf "/home/circleci/go/pkg":                                                                                                   
0.120 + go install golang.org/x/vuln/cmd/[email protected]                                                                 
0.528 go: downloading golang.org/x/vuln v1.1.3                                                                              
1.116 go: downloading golang.org/x/telemetry v0.0.0-20240522233618-39ace7a40ae7                                             
1.120 go: downloading golang.org/x/mod v0.19.0                                                                              
1.120 go: downloading golang.org/x/tools v0.23.0                                                                            
1.171 go: downloading golang.org/x/sync v0.7.0                                                                              
50.74 # net                                                                                                                 
50.74 gcc: internal compiler error: Segmentation fault signal terminated program cc1                                        
50.74 Please submit a full bug report,
50.74 with preprocessed source if appropriate.
50.74 See <file:///usr/share/doc/gcc-11/README.Bugs> for instructions.
------
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load

 2 warnings found (use docker --debug to expand):
 - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 27)
 - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 28)
Dockerfile:48
--------------------
  46 |     USER circleci
  47 |     
  48 | >>> RUN	go install "golang.org/x/vuln/cmd/govulncheck@v${GOVULNCHECK_VERSION}" && go clean -cache -modcache && rm -rf "${GOPATH}/pkg"
  49 |     
--------------------
ERROR: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/bash -exo pipefail -c go install \"golang.org/x/vuln/cmd/govulncheck@v${GOVULNCHECK_VERSION}\" && go clean -cache -modcache && rm -rf \"${GOPATH}/pkg\"" did not complete successfully: exit code: 1

seankhliao avatar Aug 21 '24 04:08 seankhliao

Might this have something to do with apparmor or other host "controls"? I was able to run the multiarch build on an Ubuntu 22.04 host using Docker 24.0.7 (from Ubuntu's packages) without errors, and inside the resulting arm64 container was able to run without errors:

  • go version
  • go telemetry on
  • go install golang.org/x/telemetry/cmd/gotelemetry@latest
  • gotelemetry on
  • gotelemetry upload (didn't have anything to upload, unsurprisingly)

mgabeler-lee-6rs avatar Aug 21 '24 13:08 mgabeler-lee-6rs

Could you run the failing command under strace -F so we can see exactly which system call is failing?

prattmic avatar Aug 21 '24 13:08 prattmic

cc @golang/telemetry

prattmic avatar Aug 21 '24 13:08 prattmic

CC @matloob

Independent of the root cause, a failure to start the telemetry child process shouldn't prevent the go command from being used.

findleyr avatar Aug 21 '24 14:08 findleyr

Could you run the failing command under strace -F so we can see exactly which system call is failing?

It appears the ptrace function(s) aren't implemented in the emulation environment: image

bjohnso5 avatar Aug 21 '24 14:08 bjohnso5

Not sure if this is helpful, but I'm attaching two strace -f output files from the linux/arm64 golang:1.23.0 and golang:1.22.6 official images running go env. Note that these are in the successful case, but I'm hoping it might help with comparison if required.

go1.22.6_go_env_strace.txt go1.23_go_env_strace.txt

bjohnso5 avatar Aug 21 '24 15:08 bjohnso5

Moved to Go1.24 milestone since this need to be fixed on the main branch first (for Go 1.24), before being considered for backporting. Please use the usual process (https://go.dev/wiki/MinorReleases) to create a separate backport tracking issue in the Go1.23.1 milestone.

@findleyr It's important that issues in the minor milestones are the backport kind with a CherryPickCandidate label, otherwise we might miss them in our release meeting review. Thanks.

dmitshur avatar Aug 21 '24 16:08 dmitshur

Thanks again @dmitshur.

@gopherbot please backport this issue to 1.23: it is a regression that breaks the go command in certain environments.

findleyr avatar Aug 21 '24 17:08 findleyr

Backport issue(s) opened: #68995 (for 1.23).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

gopherbot avatar Aug 21 '24 17:08 gopherbot

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

gabyhelp avatar Aug 21 '24 17:08 gabyhelp

Change https://go.dev/cl/607595 mentions this issue: telemetry: do not crash parent if child could not be started

gopherbot avatar Aug 21 '24 19:08 gopherbot

Change https://go.dev/cl/609195 mentions this issue: gopls: update x/telemetry to pick up recent bug fixes

gopherbot avatar Aug 28 '24 17:08 gopherbot

Change https://go.dev/cl/609196 mentions this issue: [gopls-release-branch.0.16] gopls: update x/telemetry to pick up recent bug fixes

gopherbot avatar Aug 28 '24 17:08 gopherbot

Change https://go.dev/cl/609256 mentions this issue: cmd: vendor golang.org/x/telemetry@e553cd4b

gopherbot avatar Aug 28 '24 18:08 gopherbot

Change https://go.dev/cl/609136 mentions this issue: [internal-branch.go1.23-vendor] telemetry: do not crash parent if child could not be started

gopherbot avatar Aug 28 '24 18:08 gopherbot

Change https://go.dev/cl/609237 mentions this issue: cmd: vendor golang.org/x/telemetry@a797f33

gopherbot avatar Aug 28 '24 20:08 gopherbot

Might this have something to do with apparmor or other host "controls"?

I can reproduce with both apparmor and seccomp explicitly disabled (this is on Debian Stable with Debian's qemu-user-static package installed):

$ docker run --rm --pull=always --platform linux/arm64/v8 --security-opt seccomp=unconfined --security-opt apparmor=unconfined golang:1.23 go version
1.23: Pulling from library/golang
Digest: sha256:613a108a4a4b1dfb6923305db791a19d088f77632317cfc3446825c54fb862cd
Status: Image is up to date for golang:1.23
WARNING: image with reference golang was found but does not match the specified platform: wanted linux/arm64/v8, actual: linux/amd64
can't start telemetry child process: fork/exec /usr/local/go/bin/go: invalid argument

Could you run the failing command under strace -F so we can see exactly which system call is failing?

It appears the ptrace function(s) aren't implemented in the emulation environment:

I've attached a full log with QEMU_STRACE=1 set (which is apparently the way to strace these QEMU calls correctly):

tianon avatar Aug 28 '24 21:08 tianon

For comparison, here's the same log but on the (working) 1.22 release: go-version-strace.log

tianon avatar Aug 28 '24 21:08 tianon

Thanks for the logs. The offending call is:

1 pidfd_open(1,0) = 9
1 pidfd_send_signal(9,0,NULL,0) = 0
1 clone(CLONE_VM|CLONE_VFORK|0x1011,child_stack=0x0000000000000000,parent_tidptr=0x00000040002872b8,tls=0x0000000000000000,child_tidptr=0x0000000000000000) = -1 errno=22 (Invalid argument)

Flag 0x1000 is CLONE_PIDFD. I'd assume that is the flag QEMU is complaining about.

What version of QEMU are you using? CLONE_PIDFD support appears to be added in https://github.com/qemu/qemu/commit/895ce8bb534e66ca418dea62ae67a92dccafb2e1 (QEMU 8.0).

We test that syscalls pidfd_open and pidfd_send_signal work before attempting to use CLONE_PIDFD. In Linux, support of those guarantees support for CLONE_PIDFD, but perhaps not in QEMU?

prattmic avatar Aug 28 '24 21:08 prattmic

It looks like pidfd_open, etc were added in https://github.com/qemu/qemu/commit/cc054c6f139cf54ce8fbefd6fd536f50b4cba694 (QEMU 7.2), prior to CLONE_PIDFD...

prattmic avatar Aug 28 '24 21:08 prattmic

@tianon Could you try cherry-picking https://go.dev/cl/592077 and https://go.dev/cl/592078 to see if they fix the issue?

You can get a cherry pick command from Gerrit from the upper right "..." menu -> "Download patch".

prattmic avatar Aug 28 '24 21:08 prattmic

Change https://go.dev/cl/609355 mentions this issue: [release-branch.go1.23] cmd: vendor golang.org/x/[email protected]

gopherbot avatar Aug 28 '24 21:08 gopherbot

What version of QEMU are you using?

For my working report, I've got Ubuntu 22.04's build 1:6.2+dfsg-2ubuntu6.22 (i.e. 6.x), so before any of the pidfd support from the sounds of it.

The CircleCI folks can confirm (I'm just an interested user/customer of theirs) but it looks like their build process is using multiarch/qemu-user-static:latest which hasn't been updated in 2 years (!) and reports running version 7.2.0 (Debian 1:7.2+dfsg-1~bpo11+2), which is in that "inverted support ordering" window

mgabeler-lee-6rs avatar Aug 28 '24 23:08 mgabeler-lee-6rs

Change https://go.dev/cl/609596 mentions this issue: [gopls-release-branch.0.16] update telemetry to match Go 1.23.1

gopherbot avatar Aug 29 '24 15:08 gopherbot

https://github.com/golang/go/commit/4f852b9734249c063928b34a02dd689e03a8ab2c definitely doesn't fix this issue.

tianon avatar Aug 29 '24 15:08 tianon

@tianon can you say more? That change should have avoided failing the Go binary when the telemetry child process fails to start. I agree it doesn't fix the underlying issue. Is that what you meant?

findleyr avatar Aug 29 '24 15:08 findleyr

Change https://go.dev/cl/609635 mentions this issue: gopls: update x/telemetry dependency

gopherbot avatar Aug 29 '24 15:08 gopherbot

Yes, I mean this issue shouldn't be closed by that. This issue affects more than just telemetry.

tianon avatar Aug 29 '24 15:08 tianon