compose
compose copied to clipboard
Regression: 1.28 fails with "kex_exchange_identification: Connection closed by remote host"
Description of the issue
A docker-compose up -d fails with a kex_exchange_identification error when DOCKER_HOST=ssh:// points to a mac. I do have ssh keys provisioned in ~/.ssh/config for password-less login. This worked with the previous 1.27 version.
Context information (for bug reports)
Output of docker-compose version
docker-compose version 1.28.5, build c4eb3a1f
Output of docker version
Client: Docker Engine - Community
Cloud integration: 1.0.9
Version: 20.10.5
API version: 1.41
Go version: go1.13.15
Git commit: 55c4c88
Built: Tue Mar 2 20:13:00 2021
OS/Arch: darwin/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.5
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 363e9a8
Built: Tue Mar 2 20:15:47 2021
OS/Arch: linux/amd64
Experimental: true
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Kubernetes:
Version: v1.19.3
StackAPI: Unknown
Output of docker-compose config
This is my stripped down compose file that still reproduces the error:
# Stripped down docker-compose.yml where up fails due to error:
# kex_exchange_identification: Connection closed by remote host
#
# % docker-compose --log-level DEBUG up -d
# time="2021-03-10T14:41:14-06:00" level=error msg="error while CloseRead (stream to stdout): close unix ->docker.sock: shutdown: socket is not connected"
# time="2021-03-10T14:41:14-06:00" level=error msg="error while CloseRead (stream to stdout): close unix ->docker.sock: shutdown: socket is not connected"
# WARNING: The Docker Engine you're using is running in swarm mode.
#
# Compose does not use swarm mode to deploy services to multiple nodes in a swarm. All containers will be scheduled on the current node.
#
# To deploy your application across the swarm, use `docker stack deploy`.
#
# Creating network "test_default" with driver "bridge"
# Creating volume "test_esdata" with default driver
# ...
# Creating volume "test_syslog-logs" with default driver
#
# kex_exchange_identification: Connection closed by remote host
#
# time="2021-03-10T14:41:24-06:00" level=error msg="error while CloseRead (stream to stdout): close unix ->docker.sock: shutdown: socket is not connected"
# time="2021-03-10T14:41:24-06:00" level=error msg="error while CloseRead (stream to stdout): close unix ->docker.sock: shutdown: socket is not connected"
# ...
#
networks:
default:
attachable: true
driver: bridge
services:
dnsmasq:
cap_add:
- NET_ADMIN
command: --log-facility=- --interface=*
deploy:
resources:
limits:
memory: 256m
expose:
- 53/tcp
image: andyshinn/dnsmasq:2.76
ports:
- protocol: tcp
published: 4253
target: 53
- protocol: udp
published: 4253
target: 53
restart: "no"
etcd:
command: /usr/local/bin/etcd --data-dir=/etcd-data --name node1 --initial-advertise-peer-urls
"http://etcd:2380" --listen-peer-urls http://0.0.0.0:2380 --advertise-client-urls
"http://etcd:2379" --listen-client-urls http://0.0.0.0:2379 --initial-cluster
node1="http://etcd:2380"
deploy:
resources:
limits:
memory: 256m
image: quay.io/coreos/etcd:v3.3.18
ports:
- protocol: tcp
target: 2379
- protocol: tcp
target: 2380
restart: "no"
volumes:
- etcd-data:/etcd-data:rw
loki:
command: -config.file=/etc/loki/local-config.yaml
deploy:
resources:
limits:
memory: 200m
image: grafana/loki:1.6.1
ports:
- published: 4231
target: 3100
restart: "no"
volumes:
- loki-data:/loki:rw
prometheus:
deploy:
resources:
limits:
memory: 1g
expose:
- 9090/tcp
image: prom/prometheus:v2.23.0
ports:
- protocol: tcp
published: 4209
target: 9090
restart: "no"
volumes:
- prometheus-etc:/etc/prometheus:rw
- prometheus-data:/prometheus:rw
promtail:
depends_on:
syslog:
condition: service_started
deploy:
resources:
limits:
memory: 100m
image: grafana/promtail:1.6.1
restart: "no"
volumes:
- syslog-logs:/var/log:rw
socat_etcd:
command: TCP-LISTEN:80,fork,reuseaddr TCP:etcd:2379
depends_on:
etcd:
condition: service_started
deploy:
resources:
limits:
memory: 25m
image: alpine/socat:1.0.5
ports:
- published: 4227
target: 80
syslog:
deploy:
resources:
limits:
memory: 256m
environment:
FOO_DOMAIN: foo
FOO_HOST: foo
FOO_PORT: '80'
expose:
- 514/tcp
- 514/udp
- '8888'
image: balabit/syslog-ng
ports:
- published: 4222
target: 22
- protocol: tcp
published: 4251
target: 514
- protocol: udp
published: 4251
target: 514
- published: 4290
target: 8090
- published: 4288
target: 8888
restart: "no"
version: '3.9'
volumes:
esdata: {}
esleaderdata: {}
etcd-data: {}
grafana-data: {}
grafana-etc: {}
loki-data: {}
prometheus-data: {}
prometheus-etc: {}
syslog-logs: {}
Steps to reproduce the issue
docker-compose --log-level DEBUG up -d- See a lot of harmless
"error while CloseReaderrors (https://github.com/docker/compose/issues/8190) - See the operation fails due to
kex_exchange_identification: Connection closed by remote host
Observed result
kex_exchange_identification: Connection closed by remote host
Expected result
No error, this worked with previous compose (and macOS)
Stacktrace / full error message
kex_exchange_identification: Connection closed by remote host
Additional information
OS version / distribution, docker-compose install method, etc.
This broke when I upgraded my client mac and server mac to Docker Desktop 3.2.1. It was working with previous release. I also just upgraded both macOS to latest patch 11.2.3, I guess that could have introduced the problem too.
You might have to increase the server's sshd MaxSessions -- I've had to do this ever since switching from DOCKER_HOST=tcp://... to DOCKER_HOST=ssh://.
echo 'MaxSessions 500' | sudo tee -a /etc/ssh/sshd_config && sudo launchctl stop com.openssh.sshd
Here is the successful run after downloading docker-compose 1.27.4. The output sure is different. Don't know why it says ssh://localhost:None, but I do have DOCKER_HOST set and exported correctly and it is using my remote docker server.
% echo $DOCKER_HOST
ssh://192.168.1.NN
% /Users/jamshid/Downloads/docker-compose-Darwin-x86_64-2 --version
docker-compose version 1.27.4, build 40524192
% ~/Downloads/docker-compose-Darwin-x86_64-2 --log-level DEBUG up -d
Using configuration files: ./docker-compose.yml
Trying paths: ['/Users/jamshid/.docker/config.json', '/Users/jamshid/.dockercfg']
Found file at path: /Users/jamshid/.docker/config.json
Found 'auths' section
Auth data for https://index.docker.io/v1/ is absent. Client might be using a credentials store instead.
...
Found 'credsStore' section
ssh://localhost:None "GET /version HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/networks/test_default HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/volumes/test_etcd-data HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/volumes/test_loki-data HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/volumes/test_prometheus-etc HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/volumes/test_prometheus-data HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/volumes/test_syslog-logs HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/info HTTP/1.1" 200 None
WARNING: The Docker Engine you're using is running in swarm mode.
Compose does not use swarm mode to deploy services to multiple nodes in a swarm. All containers will be scheduled on the current node.
To deploy your application across the swarm, use `docker stack deploy`.
ssh://localhost:None "GET /v1.41/networks/test_default HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/volumes/test_esdata HTTP/1.1" 200 None
...
ssh://localhost:None "GET /v1.41/volumes/test_syslog-logs HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/containers/json?limit=-1&all=0&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dtest%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 None
...
ssh://localhost:None "GET /v1.41/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dtest%22%2C+%22com.docker.compose.service%3Dpromtail%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/images/andyshinn/dnsmasq:2.76/json HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/images/quay.io/coreos/etcd:v3.3.18/json HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/images/grafana/loki:1.6.1/json HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/images/prom/prometheus:v2.23.0/json HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/images/alpine/socat:1.0.5/json HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/images/balabit/syslog-ng/json HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/images/grafana/promtail:1.6.1/json HTTP/1.1" 200 None
ssh://localhost:None "GET /v1.41/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dtest%22%2C+%22com.docker.compose.service%3Ddnsmasq%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 None
...
Pending: {<Service: socat_etcd>, <Service: loki>, <Service: dnsmasq>, <Service: prometheus>, <Service: promtail>, <Service: syslog>, <Service: etcd>}
Starting producer thread for <Service: loki>
Starting producer thread for <Service: dnsmasq>
Starting producer thread for <Service: prometheus>
Starting producer thread for <Service: syslog>
Starting producer thread for <Service: etcd>
ssh://localhost:None "GET /v1.41/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dtest%22%2C+%22com.docker.compose.service%3Dprometheus%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 None
...
Starting producer thread for ServiceName(project='test', service='prometheus', number=1)
ssh://localhost:None "GET /v1.41/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dtest%22%2C+%22com.docker.compose.service%3Dsyslog%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 None
Creating test_syslog_1 ...
Pending: {ServiceName(project='test', service='syslog', number=1)}
Starting producer thread for ServiceName(project='test', service='syslog', number=1)
Pending: {<Service: socat_etcd>, <Service: promtail>}
Pending: set()
Pending: set()
ssh://localhost:None "GET /v1.41/containers/json?limit=-1&all=1&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Dtest%22%2C+%22com.docker.compose.service%3Detcd%22%2C+%22com.docker.compose.oneoff%3DFalse%22%5D%7D HTTP/1.1" 200 None
...
Creating test_etcd_1 ...
...
ssh://localhost:None "POST /v1.41/containers/f9b8d98dceb0299f53e33ce340ff46007d73d7d13ff45de15b5c7c4df96eed2f/start HTTP/1.1" 204 0
Pending: set()
Finished processing: <Service: promtail>
Pending: set()
Pending: set()
Finished processing: <Service: socat_etcd>
Pending: set()
I seem to be running into this as well after updating to 1.28.5
$ docker-compose version
docker-compose version 1.28.5, build c4eb3a1f
docker-py version: 4.4.4
CPython version: 3.9.0
OpenSSL version: OpenSSL 1.1.1h 22 Sep 2020
$ docker version
Client: Docker Engine - Community
Cloud integration: 1.0.9
Version: 20.10.5
API version: 1.41
Go version: go1.13.15
Git commit: 55c4c88
Built: Tue Mar 2 20:13:00 2021
OS/Arch: darwin/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.5
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 363e9a8
Built: Tue Mar 2 20:15:47 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
$
Still seeing this regression with docker-compose 1.29.0.
For the record COMPOSE_PARAMIKO_SSH=1 environment variable appears to work around this problem (and significantly speeds up the operation...).
Thanks, is that its older ssh implementation? I get this error, probably because I use non-default key names in my ~/.ssh/ (https://github.com/docker/compose/issues/7514).
Traceback (most recent call last):
File "docker-compose", line 3, in <module>
File "compose/cli/main.py", line 81, in main
File "compose/cli/main.py", line 199, in perform_command
File "compose/cli/command.py", line 60, in project_from_options
File "compose/cli/command.py", line 152, in get_project
File "compose/cli/docker_client.py", line 41, in get_client
File "compose/cli/docker_client.py", line 170, in docker_client
File "docker/api/client.py", line 171, in __init__
File "docker/transport/sshconn.py", line 180, in __init__
File "docker/transport/sshconn.py", line 225, in _connect
File "paramiko/client.py", line 435, in connect
File "paramiko/client.py", line 765, in _auth
paramiko.ssh_exception.SSHException: No authentication methods available
[13042] Failed to execute script docker-compose
Yes, COMPOSE_PARAMIKO_SSH forces docker-compose back to the 1.27 and earlier SSH implementation. Honestly for me it's SO much faster (30 seconds vs 4+ minutes) I'm not sure I want to move forward... (but that should probably be a different ticket)
The problem does still reproduce with our use-case with version 1.29.2. We are still unsure how to reliably re-produce the error - some systems do have the problems, other don't. It appears to be related to versions of involved system libraries.
Edit: We do experience the problem when running docker-compose on a Linux machine to deploy to another Linux machine. So Mac on its own does not seem to be the problem here.
Seeing the same problem compose 1.29.2 in windows docker desktop 3.5.1 talking to Docker running on UBUNTU-20.04 Setting COMPOSE_PARAMIKO_SSH=1 seems to fix this.
I am having the same issue linux to linux -- docker-compose 1.29.2 (both machines) Local OS - Pop os 21.10 Remote OS - Ubuntu 20.04
Trying to start quite a few containers on a remote server with docker-compose and docker context (ssh) when running into the same problem - the peer drops the connection about 80% of the way through starting or building.
Not sure if I am on the right path but I figured I'd share after sitting here for a few hours....
On the remote server, I ran tail -f /var/log/auth.log to view the auth log. On my host I ran docker-compose --context remote up to start my docker services on the remote server.
I noticed hundreds of ssh login sessions. It consistently closes the ssh connection when it hits about 150 connections. I tested this 3 times. Are there supposed to be so many SSH sessions starting when running the docker-compose up command with context?
Either way - easy solution but not sure if it is the right way to fix this -- I updated /etc/ssh/sshd_config (on the remote server) to have MaxStartups 500 (some crazy high number). Restart the ssh service with service sshd restart and run the up command again from the local machine. This time it starts just fine!
Ah interesting... Thank you @NickisyourFan! I had run into an issue with MaxSessions pre-1.28 (the Paramiko SSH implementation), so I experimented with increasing that, but I didn't try MaxStartups. Since there doesn't seem to be much motivation to fix this bug, I'll experiment with those settings as a potential work around.
Why is this workaround needed? I had this issue with a remote docker engine on CentOS and a local WSL ubuntu docker-compose 1.29.2 client and would like to see this work without workarounds. Setting COMPOSE_PARAMIKO_SSH=1 does fix it though
I can confirm that increasing MaxSessions worked for me too. 👍
Too bad I didn't get here sooner, I had to find out the hard way 🥲
FWIW I'm not sure MaxSessions (or MaxStartups) on the remote server actually helps.
I was still getting this error using the docker-compose V2.15 client on ubuntu 20 to DOCKER_HOST=ssh://ubuntu22-server, latest docker 23.0 installed.
Run with docker --debug compose --timeout 120 ... to see the ssh connections being failed or denied. Btw note COMPOSE_HTTP_TIMEOUT no longer seems to be used, docs say it's deprecated.
Adding this to the user's ~/.ssh/config does seem to finally fix the stubborn, intermittent kex_exchange_identification failures.
The settings are documented in https://docs.docker.com/engine/security/protect-access/ but I restrict it to known docker hosts to avoid it causing ssh problems for other servers and git.
# Avoids `kex_exchange_identification: Connection closed by remote host` failures
# on docker-compose V2 operations when using
# DOCKER_HOST=ssh://remote-docker.example.com style connections.
Host remote-docker.example.com
ControlMaster auto
ControlPath ~/.ssh/control-%C
ControlPersist yes
I'm closing this issue as obsolete: Docker Compose V1 has reached end-of-life and we are not accepting any more changes. Please try and reproduce your issue with Compose V2 and create a new issue or PR with the relevant Compose V2 information.