cli icon indicating copy to clipboard operation
cli copied to clipboard

investigate terminal resize issues, and (if possible) set size on creation

Open thaJeztah opened this issue 2 years ago • 12 comments

relates to:

  • https://github.com/docker/cli/issues/3231
  • https://github.com/docker/cli/issues/3441
  • https://github.com/moby/moby/issues/42707

The TestInitTtySizeErrors test is failing frequently (see https://github.com/docker/cli/issues/3441), and it looks like it's not just a flaky test, but a bug / regression in the CLI:

--- FAIL: TestInitTtySizeErrors (0.16s)
    tty_test.go:29: assertion failed: 
        --- expectedError
        +++ →
        @@ -1,2 +1 @@
        -failed to resize tty, using default size

We had issues in this area before;

  • https://github.com/moby/moby/issues/35407
  • https://github.com/docker/cli/issues/1492
  • https://github.com/docker/cli/pull/1529
  • https://github.com/moby/moby/issues/33794
  • various other issues

Previous "fixes" implemented a retry loop to try resizing the terminal if it didn't work the first time (there's some race between the container being created and it being receptacle for resizing), so the existing code continues to be a "best effort"; https://github.com/docker/cli/blob/b68db383d389ae7c0302aaa3dc33f3155b6fe9a3/cli/command/container/run.go#L189-L193

Reproduction steps

The issue can be reproduced with this script (which just starts a lot of containers in a loop); I haven't counted the success/failure ratio, but it's failing very frequently:

#!/bin/bash

A=0
while true; do
    for a in $(seq 1 15); do
        com.docker.cli run --rm -t alpine echo $A/$a &
    done

    wait
    date
    A=$((A+1))
    echo $A
done

Coincidentally, when using a Windows machine, the CLI sets the initial size when creating the container.

https://github.com/docker/cli/blob/b68db383d389ae7c0302aaa3dc33f3155b6fe9a3/cli/command/container/run.go#L122-L127

The code above looks wrong though;

  • it checks for the operating system on the CLI side, but does not take into account that the daemon could be either a Linux or a Windows daemon
  • which could be either an "omission" (initial size supported on both Linux and Windows daemons)
  • or (if Linux daemons do not support this) result in Linux container shells never getting the right size when started from a Windows CLI.
  • or "redundant" (initial size already connect, but after this, doing a resize to the same size again)

What's needed?

  • Investigate what's missing (if anything) to set the correct terminal size when running (docker run), attaching (docker attach) and exec-ing (docker exec) a container
  • If setting the initial size is not supported on a Linux daemon, why not?
  • ISTR containerd did not support setting the initial size in all of these cases (is this just something that's not "punched through" in the containerd API, or is more needed?)

If it's possible to set the initial size, we should prefer that over "resizing", although resizing would still be needed if the terminal from which the container was started is resized. Of course, we also need to take into account whether the daemon (and containerd) supports setting the initial size (which may depend on the API version).

thaJeztah avatar Apr 12 '22 13:04 thaJeztah

Just a quick comment, if I change the amount of sleep between two resizes to 200ms then the error goes away completely for me, this is obviously not ideal and makes me think that this is more an error on the daemon side, I'll continue looking.

rumpl avatar Apr 27 '22 14:04 rumpl

200ms seems quite long to wait between tries; so seems to indicate something has become "slower" 🤔 was the 200ms just a "let's try something bigger", or did you try incrementing in smaller steps? (perhaps we can do a quick try to see where the "it mostly works" threshold is).

For a short term "quick fix", we could increase the timeout (although I hope if doesn't have to be 200ms!!!); of course the ideal solution is to (as described above) look if we can set the correct size when creating (so that we don't have to do the resize loop); of course if we do, that likely would have to be done only on "newer" APIs only (would have to check); and of course if dockerd and containerd actually support it (and if not "why not?")

thaJeztah avatar Apr 28 '22 08:04 thaJeztah

We could either do 200ms, or 100ms but retry 10 times, it looks like 1 second is the lowest we should go, see this comment by @cpuguy83

I did a quick check and it seems that you can set the TTY when you create a container, here is the test:

containerd code to set TTY size on creation
package main

import (
	"context"
	"errors"
	"fmt"
	"log"
	"syscall"
	"time"

	"github.com/containerd/console"
	"github.com/containerd/containerd"
	"github.com/containerd/containerd/cio"
	"github.com/containerd/containerd/namespaces"
	"github.com/containerd/containerd/oci"
)

func main() {
	if err := redisExample(); err != nil {
		log.Fatal(err)
	}
}

func redisExample() error {
	// create a new client connected to the default socket path for containerd
	client, err := containerd.New("/run/containerd/containerd.sock")
	if err != nil {
		return err
	}
	defer client.Close()

	// create a new context with an "example" namespace
	ctx := namespaces.WithNamespace(context.Background(), "example")

	// pull the redis image from DockerHub
	image, err := client.Pull(ctx, "docker.io/library/busybox:latest", containerd.WithPullUnpack)
	if err != nil {
		return err
	}

	// create a container
	container, err := client.NewContainer(
		ctx,
		"redis-server",
		containerd.WithImage(image),
		containerd.WithNewSnapshot("redis-server-snapshot", image),
		// Setting a small size terminal to make sure this is taken into account, to test run "ls", it should wrap
		// its output to fit into a small window
		containerd.WithNewSpec(oci.Compose(oci.WithImageConfig(image), oci.WithTTY, oci.WithTTYSize(30, 30))),
	)

	if err != nil {
		return err
	}
	defer container.Delete(ctx, containerd.WithSnapshotCleanup)

	var con console.Console
	con = console.Current()
	defer con.Reset()
	if err := con.SetRaw(); err != nil {
		return fmt.Errorf("failed to SetRaw,err is: %w", err)
	}
	if con == nil {
		return errors.New("got nil con with flagT=true")
	}
	// create a task from the container
	task, err := container.NewTask(ctx, cio.NewCreator(cio.WithStreams(con, con, nil), cio.WithTerminal))
	if err != nil {
		return err
	}
	defer task.Delete(ctx)

	// make sure we wait before calling start
	exitStatusC, err := task.Wait(ctx)
	if err != nil {
		fmt.Println(err)
	}

	// call start on the task to execute the redis server
	if err := task.Start(ctx); err != nil {
		return err
	}

	// sleep for a lil bit to see the logs
	time.Sleep(30 * time.Second)

	// kill the process and get the exit status
	if err := task.Kill(ctx, syscall.SIGTERM); err != nil {
		return err
	}

	// wait for the process to fully exit and print out the exit status

	status := <-exitStatusC
	code, _, err := status.Result()
	if err != nil {
		return err
	}
	fmt.Printf("redis-server exited with status: %d\n", code)

	return nil
}

Running this code doesn't fail and the terminal size is right:

➜  containerdtest sudo ./containerdtest
/ # ls
bin   home  run   usr
dev   proc  sys   var
etc   root  tmp

rumpl avatar Apr 28 '22 12:04 rumpl

hm.... rendering bug on GitHub?? The > triangle widget no longer renders on collapsed things (😱); see

Screenshot 2022-04-28 at 14 12 26

thaJeztah avatar Apr 28 '22 12:04 thaJeztah

That's odd, works on my machine :D

rumpl avatar Apr 28 '22 12:04 rumpl

you on Safari, or on Chrome? (looks indeed to still work on Chrome 🤔)

thaJeztah avatar Apr 28 '22 12:04 thaJeztah

I'm on Firefox and yeah, it works well

rumpl avatar Apr 28 '22 12:04 rumpl

Arf; didn't post 😂

Back to "on topic"; with your test, could you try if it picks up the size immediately? (I recall the issue when not resizing is that everything "seems" ok, only to discover the size is limited to the default 80-chars, and output getting truncated.

For example;

On my host;

tput cols && tput lines
204
37

And in a container:

docker run -it --rm ubuntu sh -c 'tput cols; tput lines'
80
24

(this is with the current release, so the initial size is still 80x24, but then gets resized if you would run interactively and continue working in that shell

thaJeztah avatar Apr 28 '22 13:04 thaJeztah

Tested with my code above, I changed the image to be ubuntu and added oci.WithProcessArgs("tput", "lines") and it outputs the right number of lines (30), outputing cols works too. So it looks like containerd really does pick this up immediately. On moby/moby it might be a bit trickier (?), I saw somewhere a comment that we attach a dummy terminal. Will have to dig moby/moby a bit more

rumpl avatar Apr 28 '22 14:04 rumpl

Nice. so perhaps "all" that would be needed are the changes I mentioned in https://github.com/docker/cli/pull/3572#issuecomment-1112069795

applying the changes in Moby isn't too difficult (but as described there, the "expectations" on the client side will have to be gated by API version (pre v1.42 -> do the resize loop, API v1.42 or up -> initial size should already be good)

thaJeztah avatar Apr 28 '22 15:04 thaJeztah

Updating this thread; this PR is making the time between retries a bit longer, and changes the logic for the initial resize to try 10 times (instead of 5);

  • https://github.com/docker/cli/pull/3573

Copying my comment (https://github.com/docker/cli/pull/3573#issuecomment-1113959463) from that PR so that it doesn't get lost once that's merged;

Let me comment here, as we were discussing this;

  • First of all, this loop is for the initial resize (docker run -> resize to desired dimensions), and not used for the "monitor for terminal to be resized" case
  • The retry loop is only hit after we tried a resize immediately (line 54 does a resize immediately after attaching, and we only hit the retry loop if that fails)
  • so: incrementing this sleep does not delay the initial resize (which is good)

Thinking about this a bit more;

  1. for a "daemon under load", it may be expected that we need slightly longer (or more retries)
  2. if the resize is taking longer for other (daemon not under load) situations, it may be that there's a regression in performance
  3. so even if we fix the initial size (to be defined when creating the container, and without a "resize" after), the "monitor for resize" would still need to trigger resizes
  4. so: if there's a regression in performance, that "monitor" case would still be affected by the regression (if any), so should still be looked into. (at least, we can keep a tracking ticket to do so)

thaJeztah avatar Apr 30 '22 09:04 thaJeztah

So fixes are being made in https://github.com/moby/moby/pull/43593 and https://github.com/docker/cli/pull/3619 was merged, which also sets the size on docker create (not only on docker run).

Let me copy my comment from a Slack thread with @vvoland

I was digging a bit in the code, and wondering if we could also make it work for docker exec; and it looks like this is where we call out to containerd; https://github.com/moby/moby/blob/7b9275c0da707b030e62c96b679a976f31f929d3/libcontainerd/remote/client.go#L307-L310

	p, err = t.Exec(ctx, processID, spec, func(id string) (cio.IO, error) {
		rio, err = c.createIO(fifos, containerID, processID, stdinCloseSync, attachStdio)
		return rio, err
	})

In which spec is a spec *specs.Process, which has a ConsoleSize.

So if I'm right, it would be possible to have docker exec also set the initial size, but would mean we need to add extra options to the API to allow passing the size, but perhaps it's possible 🤔

thaJeztah avatar May 18 '22 15:05 thaJeztah

Is this fully fixed now in v23? https://github.com/moby/moby/pull/43622 and https://github.com/docker/cli/pull/3627 seem to address the recent comment.

markdascher avatar Jul 25 '23 13:07 markdascher

I think it may be yes; @vvoland was there anything left to look into?

thaJeztah avatar Jul 25 '23 13:07 thaJeztah

No, this was fully addressed as far as I know, so I think this can be closed.

@markdascher are you still experiencing any of the issues described here?

vvoland avatar Jul 25 '23 13:07 vvoland

I think my issues were gone after the initial fixes somewhere around v18, so I'm probably not the best judge. (But for what it's worth, I haven't seen any issues with recent versions.)

Mostly I was just poking around to see when all the remaining edge cases would've been fixed, and the answer seems to be v23. But then saw this issue was still open.

markdascher avatar Jul 25 '23 13:07 markdascher

Alright; looks like it's time to close this one then! Thanks for the reminder 🤗

thaJeztah avatar Jul 25 '23 15:07 thaJeztah