for-mac icon indicating copy to clipboard operation
for-mac copied to clipboard

Running QEMU nested under Rosetta strips argv[0] too much

Open david-l-riley opened this issue 9 months ago • 1 comments

Description

While running the Xilinx Petalinux build tools (which are currently amd64-only) under Docker, petalinux-build would fail toward the end of the target image build process when it tried to run some things under the newly-built root fs using qemu-aarch64, as per the built-in rules. It was failing with some unusual errors indicating bad arguments, and I eventually determined that qemu-aarch64 was stripping out argv[0], which was causing obvious issues especially with multi-call binaries like busybox (or, in this case, udevadm was reporting its own name as the sub-command name).

Reproduce

  1. Create a test program cli.go like the following (in Go because it is compact, has built-in cross-compilation and produces static binaries):
package main

import "fmt"
import "os"

func main() {
	for _, arg := range os.Args {
		fmt.Println(arg)
	}
}
  1. Build the test program for ARM (without module support if you don't want to fiddle with go.mod): GO111MODULE=off GOOS=linux GOARCH=arm go build cli.go
  2. On an Apple Silicon Mac with Rosetta emulation enabled, run a container for linux/amd64 with the cli binary directory bind-mounted: docker run --rm -it -v .:/test --platform linux/amd64 alpine
  3. Verify that the built-in binfmt_misc ARM emulation works correctly:
/ # test/cli foo bar
test/cli
foo
bar
  1. Install qemu-arm and test the binary with that, observing the consumption of argv[0]:
/ # apk add qemu-arm
fetch https://dl-cdn.alpinelinux.org/alpine/v3.21/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.21/community/x86_64/APKINDEX.tar.gz
(1/1) Installing qemu-arm (9.1.2-r1)
Executing busybox-1.37.0-r12.trigger
OK: 11 MiB in 16 packages
/ # qemu-arm test/cli foo bar
foo
bar
  1. Repeat the procedure, but with a non-Rosetta emulation platform (e.g. linux/ppc64le) to validate that nested emulation works correctly under QEMU:
$ docker run --rm -it -v .:/test --platform linux/ppc64le alpine
/ # test/cli foo bar
test/cli
foo
bar
/ # apk add qemu-arm
fetch https://dl-cdn.alpinelinux.org/alpine/v3.21/main/ppc64le/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.21/community/ppc64le/APKINDEX.tar.gz
(1/1) Installing qemu-arm (9.1.2-r1)
Executing busybox-1.37.0-r12.trigger
OK: 13 MiB in 16 packages
/ # qemu-arm test/cli foo bar
test/cli
foo
bar

Expected behavior

The cli program should echo all arguments, including the program itself from argv[0], under both the first emulation layer and the second, as in Step 6 above:

/ # test/cli foo bar
test/cli
foo
bar
/ # qemu-arm test/cli foo bar
test/cli
foo
bar

However, under Rosetta, it works correctly under the initial emulation layer, but the second layer removes the initial argv[0]:

/ # test/cli foo bar
test/cli
foo
bar
/ # qemu-arm test/cli foo bar
foo
bar

docker version

Client:
 Version:           28.0.1
 API version:       1.48
 Go version:        go1.23.6
 Git commit:        068a01e
 Built:             Wed Feb 26 10:38:16 2025
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.39.0 (184744)
 Engine:
  Version:          28.0.1
  API version:      1.48 (minimum version 1.24)
  Go version:       go1.23.6
  Git commit:       bbd0a17
  Built:            Wed Feb 26 10:40:57 2025
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.7.25
  GitCommit:        bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
 runc:
  Version:          1.2.4
  GitCommit:        v1.2.4-0-g6c52b3f
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    28.0.1
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Docker AI Agent - Ask Gordon (Docker Inc.)
    Version:  v0.9.4
    Path:     /Users/[redacted]/.docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.21.1-desktop.2
    Path:     /Users/[redacted]/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.33.1-desktop.1
    Path:     /Users/[redacted]/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.38
    Path:     /Users/[redacted]/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Beta) (Docker Inc.)
    Version:  v0.1.5
    Path:     /Users/[redacted]/.docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/[redacted]/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.27
    Path:     /Users/[redacted]/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.5
    Path:     /Users/[redacted]/.docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.4.0
    Path:     /Users/[redacted]/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/[redacted]/.docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.16.3
    Path:     /Users/[redacted]/.docker/cli-plugins/docker-scout

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 12
 Server Version: 28.0.1
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 CDI spec directories:
  /etc/cdi
  /var/run/cdi
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
 runc version: v1.2.4-0-g6c52b3f
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.10.14-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 4
 Total Memory: 31.29GiB
 Name: docker-desktop
 ID: 51921912-82b5-425c-bc67-17ac52944e1d
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///Users/[redacted]/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false

Diagnostics ID

E770F12A-7617-4284-96B7-4FB2A0A69D8D/20250313115712

Additional Info

As far as I can tell, what's happening is that Rosetta isn't stripping out the PRESERVE_ARGV0 flag from /proc/self/auxv when it runs qemu-aarch64, and thus QEMU thinks it needs to strip out the interpreter argv[0] here (the v7.1.0 tag is for the particular version of QEMU the image uses, but it's the same in the most recent ones as well).

This issue is closely related to #7058, though it is not really the same issue.

This is probably more of a Rosetta-for-Linux issue than anything else, but it does directly impact operations under Docker in the (admittedly probably rare) circumstance where nested user-mode emulation occurs. This problem does not happen in containers running under QEMU emulation (e.g. for ppc64le), indicating that QEMU is correctly manipulating the auxiliary info. I assume the best path to get this brought up to developers at Apple who can help is through a Docker bug report.

david-l-riley avatar Mar 13 '25 12:03 david-l-riley

FWIW: QEMU does zero out the AT_FLAGS field in the auxiliary table here.

david-l-riley avatar Mar 13 '25 14:03 david-l-riley