cli icon indicating copy to clipboard operation
cli copied to clipboard

High CPU usage when backgrounding CLI with -i

Open JennToo opened this issue 8 months ago • 3 comments

Description

When running a docker container with the -i option combined with backgrounding the process (e.g. using & at the end of the bash command), CPU usage jumps to 100% for the docker CLI process

Reproduce

  1. Open bash, or any other shell that supports & backgrounding
  2. docker run -i --rm alpine:latest sleep 300000 &

Expected behavior

The process should continue to run in the background, but just not accept input unless foregrounded.

docker version

Client: Docker Engine - Community
 Version:           28.0.4
 API version:       1.48
 Go version:        go1.23.7
 Git commit:        b8034c0
 Built:             Tue Mar 25 15:07:16 2025
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          28.0.4
  API version:      1.48 (minimum version 1.24)
  Go version:       go1.23.7
  Git commit:       6430e49
  Built:            Tue Mar 25 15:07:16 2025
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.27
  GitCommit:        05044ec0a9a75232cad458027ca83437aae3f4da
 runc:
  Version:          1.2.5
  GitCommit:        v1.2.5-0-g59923ef
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    28.0.4
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.22.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.34.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 3
  Running: 3
  Paused: 0
  Stopped: 0
 Images: 3
 Server Version: 28.0.4
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
 runc version: v1.2.5-0-g59923ef
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 5.15.167.4-microsoft-standard-WSL2
 Operating System: Ubuntu 24.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 15.47GiB
 Name: jwilco7670l
 ID: 74516f92-1428-475d-b1e6-b8d039ba0a8b
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  docker.adtran.com:5000
  cn-docker.adtran.com:5000
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false
 Default Address Pools:
   Base: 10.120.128.0/17, Size: 24

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

Additional Info

I did some light attempts at diagnosing what is going on.

With an strace capture, I see tons of these events:

read(0, 0xc00027c000, 32768)            = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
read(0, 0xc00027c000, 32768)            = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
read(0, 0xc00027c000, 32768)            = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
read(0, 0xc00027c000, 32768)            = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
read(0, 0xc00027c000, 32768)            = ? ERESTARTSYS (To be restarted if SA_RESTART is set)

If I attach to the process with GDB, the process is seems to live in this call stack pretty consistently:

(gdb) thread apply all bt

Thread 1 (Thread 0x7f899e7fc6c0 (LWP 27495) "docker"):
#0  runtime.futex () at /usr/local/go/src/runtime/sys_linux_amd64.s:558
#1  0x0000558e960fc470 in runtime.futexwakeup (addr=0x1, cnt=1) at /usr/local/go/src/runtime/os_linux.go:82
#2  0x0000558e960d5625 in runtime.notewakeup (n=0x1) at /usr/local/go/src/runtime/lock_futex.go:155
#3  0x0000558e9611963c in runtime.sigsend (s=<optimized out>, ~r0=<optimized out>) at /usr/local/go/src/runtime/sigqueue.go:116
#4  0x0000558e96118048 in runtime.sighandler (sig=21, info=<optimized out>, ctxt=<optimized out>, gp=0xc0004c6380) at /usr/local/go/src/runtime/signal_unix.go:722
#5  0x0000558e9611796a in runtime.sigtrampgo (sig=21, info=0xc00050dbf0, ctx=0xc00050dac0) at /usr/local/go/src/runtime/signal_unix.go:490
#6  0x0000558e96141a29 in runtime.sigtramp () at /usr/local/go/src/runtime/sys_linux_amd64.s:352
#7  <signal handler called>
#8  internal/runtime/syscall.Syscall6 () at /usr/local/go/src/internal/runtime/syscall/asm_linux_amd64.s:35
#9  0x0000558e9619a50d in syscall.RawSyscall6 (trap=0, a1=0, a2=94070891631086, a3=0, a4=824636325888, a5=0, a6=0, r1=<optimized out>, r2=<optimized out>, err=<optimized out>) at /usr/local/go/src/syscall/syscall_linux.go:64
#10 0x0000558e9619a566 in syscall.Syscall (trap=0, a1=0, a2=824636325888, a3=32768, r1=<optimized out>, r2=<optimized out>, err=<optimized out>) at /usr/local/go/src/syscall/syscall_linux.go:86
#11 0x0000558e96198558 in syscall.read (fd=<optimized out>, p=..., n=<optimized out>, err=...) at /usr/local/go/src/syscall/zsyscall_linux_amd64.go:736
#12 0x0000558e961baf6e in syscall.Read (fd=0, p=..., n=<optimized out>, err=...) at /usr/local/go/src/syscall/syscall_unix.go:183
#13 internal/poll.ignoringEINTRIO (fd=0, p=..., fn=<optimized out>) at /usr/local/go/src/internal/poll/fd_unix.go:745
#14 internal/poll.(*FD).Read (fd=0xc00014a000, p=..., ~r0=<optimized out>, ~r0=<optimized out>, ~r1=..., ~r1=...) at /usr/local/go/src/internal/poll/fd_unix.go:161
#15 0x0000558e961c67b2 in os.(*File).read (f=0xc000134098, b=..., n=<optimized out>, err=...) at /usr/local/go/src/os/file_posix.go:29
#16 os.(*File).Read (f=0xc000134098, b=..., n=<optimized out>, err=...) at /usr/local/go/src/os/file.go:124
#17 0x0000558e965d3a22 in github.com/docker/cli/cli/streams.(*In).Read (i=<optimized out>, p=..., ~r0=<optimized out>, ~r1=...) at /go/src/github.com/docker/cli/cli/streams/in.go:21
#18 0x0000558e96189531 in io.copyBuffer (dst=..., src=..., buf=..., written=<optimized out>, err=...) at /usr/local/go/src/io/io.go:429
#19 0x0000558e969b7572 in io.Copy (dst=..., src=..., err=..., written=<optimized out>) at /usr/local/go/src/io/io.go:388
#20 github.com/docker/cli/cli/command/container.(*hijackedIOStreamer).beginInputStream.func1 () at /go/src/github.com/docker/cli/cli/command/container/hijack.go:150
#21 0x0000558e9613ff01 in runtime.goexit () at /usr/local/go/src/runtime/asm_amd64.s:1700
#22 0x0000000000000000 in ?? ()

Additionally in GDB when I resume the process I can see it is receiving a lot of these signals:

(gdb) c
Continuing.

Program received signal SIGTTIN, Stopped (tty input).
internal/runtime/syscall.Syscall6 () at /usr/local/go/src/internal/runtime/syscall/asm_linux_amd64.s:36
warning: 36     /usr/local/go/src/internal/runtime/syscall/asm_linux_amd64.s: No such file or directory
(gdb)

JennToo avatar Apr 11 '25 14:04 JennToo

Was this something that you used successfully on older versions of the CLI (i.e., is this a regression)? Also slightly curious about your use-case 😅 (e.g., if the --detach option would be an option)

thaJeztah avatar Apr 11 '25 15:04 thaJeztah

Its such an odd corner case that I don't know if we've ever even tried it on previous versions.

And yeah -d (or really just dropping -i) definitely could be an option in some cases.

With the specific way we're using it, we have a CLI wrapper around docker to run some developer tools where sometimes we do need -i and sometimes we don't. We don't have a great way to detect whether our user will need the -i or not, so we just add it unconditionally. I might see if there's some way we can detect that we're backgrounded from within the wrapper before we invoke docker. Then we could silently drop the -i

There's definitely workarounds for it, certainly not something I'd consider a show stopper.

JennToo avatar Apr 11 '25 15:04 JennToo

Thanks for the extra context!

Yes, with your reproduction steps, it's probably not too complicated to figure out if it was a regression, or just something that was always there. I was curious if you knew it was a regression, as, even if some obscure use, with <insert large number> users, it's always possible someone else is doing something similar (and may not yet have upgraded).

thaJeztah avatar Apr 11 '25 16:04 thaJeztah