cli icon indicating copy to clipboard operation
cli copied to clipboard

api-version negotiation sometimes fails over SSH connection

Open thaJeztah opened this issue 8 months ago • 2 comments

Description

  • related https://github.com/docker/cli/pull/6121#issuecomment-2944340239

API-version context negotiation seems to fail sometimes. In CI, this also shows with an error/warning (but in my reproduction steps, that didn't show for some reason, but possibly was suppressed by the output format)

Reproduce

docker context create --docker host=ssh://swarm-test-02 remote
remote
Successfully created context "remote"

The debug log comes from commandcon.New, so not 100% sure yet why it's triggered twice; https://github.com/docker/cli/blob/9e506545fdead79c574345a0414c71a0bd6857d6/cli/connhelper/commandconn/commandconn.go#L34-L73

In this case it was run twice, and (see API version: 1.50) it didn't negotiate an API version;

docker context create --docker host=ssh://swarm-test-02 remote
remote
Successfully created context "remote"

docker --debug --context remote version
time="2025-06-10T14:29:31+02:00" level=debug msg="commandconn: starting ssh with [-o ConnectTimeout=30 -T -- swarm-test-02 docker system dial-stdio]"
time="2025-06-10T14:29:33+02:00" level=debug msg="commandconn: starting ssh with [-o ConnectTimeout=30 -T -- swarm-test-02 docker system dial-stdio]"
Client:
 Version:           28.2.2
 API version:       1.50
 Go version:        go1.24.3
 Git commit:        e6534b4
 Built:             Fri May 30 12:07:35 2025
 OS/Arch:           darwin/arm64
 Context:           remote

Server: Docker Engine - Community
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.6
  Git commit:       a61e2b4
  Built:            Fri Jul 21 20:35:23 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

On a second attempt, it was only ran once, and did negotiate (API version: 1.43 (downgraded from 1.50));

docker --debug --context remote version
time="2025-06-10T14:30:05+02:00" level=debug msg="commandconn: starting ssh with [-o ConnectTimeout=30 -T -- swarm-test-02 docker system dial-stdio]"
Client:
 Version:           28.2.2
 API version:       1.43 (downgraded from 1.50)
 Go version:        go1.24.3
 Git commit:        e6534b4
 Built:             Fri May 30 12:07:35 2025
 OS/Arch:           darwin/arm64
 Context:           remote

Server: Docker Engine - Community
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.6
  Git commit:       a61e2b4
  Built:            Fri Jul 21 20:35:23 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Expected behavior

No response

docker version

Client:
 Version:           28.2.2
 API version:       1.50
 Go version:        go1.24.3
 Git commit:        e6534b4
 Built:             Fri May 30 12:07:35 2025
 OS/Arch:           darwin/arm64
 Context:           remote

docker info

Likely not relevant

Additional Info

No response

thaJeztah avatar Jun 10 '25 12:06 thaJeztah

Looks like it's related to the ssh host being used for the first time; sometime possibly caching things?

After creating a context for a different SSH host, it showed the first time;

docker --debug --context remote2 version
time="2025-06-10T14:45:38+02:00" level=debug msg="commandconn: starting ssh with [-o ConnectTimeout=30 -T -- swarm-test-03 docker system dial-stdio]"
time="2025-06-10T14:45:40+02:00" level=debug msg="commandconn: starting ssh with [-o ConnectTimeout=30 -T -- swarm-test-03 docker system dial-stdio]"
Client:
 Version:           28.2.2
 API version:       1.50
 Go version:        go1.24.3
 Git commit:        e6534b4
 Built:             Fri May 30 12:07:35 2025
 OS/Arch:           darwin/arm64
 Context:           remote2

Second time, it connected faster and was successful;

docker --debug --context remote2 version
time="2025-06-10T14:45:55+02:00" level=debug msg="commandconn: starting ssh with [-o ConnectTimeout=30 -T -- swarm-test-03 docker system dial-stdio]"
Client:
 Version:           28.2.2
 API version:       1.42 (downgraded from 1.50)
 Go version:        go1.24.3
 Git commit:        e6534b4
 Built:             Fri May 30 12:07:35 2025
 OS/Arch:           darwin/arm64
 Context:           remote2

Trying to create a new context for the same host made no difference; it doesn't reproduce if it's for the same host, so possibly SSH caching things / writing things for the first connection?

thaJeztah avatar Jun 10 '25 12:06 thaJeztah

Also worth noting that while the Client info in the docker version shows "1.50", the daemon connection succeeded, and shows "1.42"

thaJeztah avatar Jun 10 '25 12:06 thaJeztah