[BUG] Overlay network not found on worker node
Description
Issue:
Swarm worker hosts fail to attach to manager node overlay networks unless a container has been manually started and attached to the network using docker run --network swarm-overlay
Expected Behavior: This should automatically attach to the overlay network and it should be visible in the docker network info.
$> docker network ls
8e3c351af333 bridge bridge local
0cbc0420c111 docker_gwbridge bridge local
x8gb7mz6s222 swarm-overlay overlay swarm
c09ad17a7321 host host local
keth4xuub123 ingress overlay swarm
d8baa27f3654 none null local
Workaround:
The only solution I have found is to downgrade to an earlier version (2.21.0-1) of docker-compose-plugin
sudo apt list -a docker-compose-plugin
sudo apt install docker-compose-plugin=2.21.0-1~debian.11~bullseye
I believe this is the same issue as https://github.com/docker/compose/issues/11387 but i couldn't find any open bugs with the same issue.
Thanks for any help with this!
Steps To Reproduce
I created a custom overlay network on the swarm manager node.
...
service:
image: service-image
container_name: service
networks:
- swarm-overlay
restart: unless-stopped
...
networks:
swarm-overlay:
attachable: true
driver: overlay
This correctly created the network and attached the relevant container to it.
I then joined a worker host to the swarm and attempted to connect a container to the overlay network.
...
worker-service:
image: worker-image
container_name: worker-service
networks:
swarm-overlay:
aliases:
- host1-worker-service
restart: unless-stopped
...
networks:
swarm-overlay:
external: true
driver: overlay
docker compose up -d worker-service
This errors with:
Error response from daemon: network swarm-overlay not found
Compose Version
docker-compose-plugin/bullseye 2.27.1-1~debian.11~bullseye
Docker Compose version v2.27.1
Docker Environment
Client: Docker Engine - Community
Version: 26.1.4
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.14.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.27.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 12
Running: 5
Paused: 0
Stopped: 7
Images: 31
Server Version: 26.1.4
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: 2brhg9vzj8m47oyo40ie5yj0u
Is Manager: false
Node Address: 1.2.3.4
Manager Addresses:
4.3.2.1:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d2d58213f83a351ca8f528a95fbd145f5654e957
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.10.0-28-cloud-amd64
Operating System: Debian GNU/Linux 11 (bullseye)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 13.42GiB
Name: cloud-machine
ID: 6c0ae974-1ba3-450a-ab03-d31b31c6097f
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Anything else?
No response
This isn't the same issue as #11387 as here this is the docker engine reporting error: Error response from daemon: network swarm-overlay not found
Can you please confirm you can use docker run --network swarm-overlay ... to run equivalent container on worked node with this swarm setup ?
I'm running into this exact same issue using Docker Compose 2.27.0. I can confirm that I can use docker run -it --name alpine1 --network test-net alpine from the official documentation. I walked through the entirety of the "Use an overlay network for standalone containers" and it worked as expected.
However, using docker compose files, I also get the error Error response from daemon: network <my network name here> not found message using docker compose up -d.
I am having the exact same issue. Docker Compose version v2.27.1 @ndeloof docker run --network swarm-overlay works and compose doesn't
btw is the downgrade workaround needed for both leader and worker node?
@inql I have not tested this as our scripts set versions for all nodes.
Hey there, also affected by this bug.
If you don't want to downgrade another workaround is to create a container and attach it to the network. It then appears in the list and docker compose no longer complains
docker run -dit --name keep-alive --network --restart=always <network_name> alpine
Adding --restart=always will ensure that it survives restarts of the docker daemon, etc.
My versions in case it is useful:
docker version
Client: Docker Engine - Community Version: 27.0.3 API version: 1.46 Go version: go1.21.11 Git commit: 7d4bcd8 Built: Sat Jun 29 00:02:50 2024 OS/Arch: linux/amd64 Context: default
Server: Docker Engine - Community Engine: Version: 27.0.3 API version: 1.46 (minimum version 1.24) Go version: go1.21.11 Git commit: 662f78c Built: Sat Jun 29 00:02:50 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.7.18 GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e runc: Version: 1.7.18 GitCommit: v1.1.13-0-g58aa920 docker-init: Version: 0.19.0 GitCommit: de40ad0
docker compose version
Docker Compose version v2.28.1
As in above, sorry did not realise that @michaelmcandrew also mentioned this but at least this comment confirms his findings: https://github.com/docker/compose/issues/11894#issuecomment-2206522846
I tested this issue and noticed that if there exists running container which has connection to the external overlay network (started with docker run ... and visible in docker network ls), then the compose is able to connect to the external overlay network.
So, without knowing anything about internals, the problem might have something to do with not checking for available external overlay networks but instead checking just internal networks (visible with docker network ls).
So as an additinal workaround it is possible to first start "dummy" container on workers via for example:
$ docker compose up -d
Error response from daemon: network <overlay-network> not found
$ run -dit --rm --name dummy-network-container --network <overlay-network> alpine
43924b1b25ac73373aac9120b55ac46fc1de3435ce26485682e11d6c06671936
$ docker compose up -d
[+] Running 1/0
✔ Container worker-service Started
$ _
I also checked downgrading and for Ubuntu 22.04 it worked, so I think I will be using downgraded version for now myself.
sudo apt-get remove docker-compose-plugin && sudo apt-get install docker-compose-plugin=2.21.0-1~ubuntu.22.04~jammy
$ docker version
Client: Docker Engine - Community
Version: 27.0.3
API version: 1.46
Go version: go1.21.11
Git commit: 7d4bcd8
Built: Sat Jun 29 00:02:33 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.0.3
API version: 1.46 (minimum version 1.24)
Go version: go1.21.11
Git commit: 662f78c
Built: Sat Jun 29 00:02:33 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.18
GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc:
Version: 1.7.18
GitCommit: v1.1.13-0-g58aa920
docker-init:
Version: 0.19.0
GitCommit: de40ad0
$ docker compose version
Docker Compose version v2.28.1
@kulpsin docker network ls indeed does not detect overlay networks created on another swarm node (not sure about the reason, but that's what we get with the engine API) until it is used by some container. So Docker Compose can't check network existence, but should detect swarm is enabled and ignore error (assuming container create will fail if there's an actual missing network). See https://github.com/docker/compose/blob/11d5ecdc75ab96214f35db4cdc0361ee080d1c07/pkg/compose/create.go#L1334-L1340
Not sure why this doesn't work as expected, need to setup a test environment and try to reproduce this bug
With the original compose.yml it would generate swarm-netword-overlay_swarm-overlay network
...and then the
worker would not be able to find the external network as expected
By adding the name: swarm-overlay on the network it made it work for me for version v2.28.1
docker compose up -d
...
service:
image: service-image
container_name: service
networks:
- swarm-overlay
restart: unless-stopped
...
networks:
swarm-overlay:
name: swarm-overlay <----
attachable: true
driver: overlay
after this it generates the following result for docker network ls
and now the worker is referencing the right network
To flesh out my steps to reproduce a bit more, since they are slightly different from the ones mentioned above, I created a swarm network on the lead node with docker network create --driver overlay test --attachable.
This network was not visible on the worker node (expected I think because nothing was connected).
However, I was not able to connect to it with the below networks section in a compose.yaml on the worker node.
networks:
test:
external: true
I created the following container on the worker node docker run -dit --name keep-alive --network test --restart=always alpine
I was then able to connect using the above networks section in a compose.yaml on the worker node.
Hope that help with the reproduction!
I created the following container on the worker node
docker run -dit --name keep-alive --network test --restart=always alpine
Thanks this worked for me.
Is this a bug in compose? I would expect somewhat feature parity between docker and docker compose.
@tuxthepenguin84 docker compose does some client-side validation before running containers, and as such looks for target network to exist. docker run will just fail if not found, without preliminary validation.
Can you please confirm issue persists with latest version ? AFAIK we had a fix for it
It appears to me the issue still persists, at least for me and my use case.
Docker Compose version v2.29.7
Client: Docker Engine - Community
Version: 27.3.1
API version: 1.47
Go version: go1.22.7
Git commit: ce12230
Built: Fri Sep 20 11:41:00 2024
OS/Arch: linux/amd64
Context: default
[+] Running 3/3
✔ Container proxy2-nginx-exporter Removed 0.5s
✔ Container proxy2 Removed 1.8s
✔ Network proxy_default Removed 0.4s
[+] Running 2/3
✔ Network proxy_default Created 0.8s
⠸ Container proxy2 Starting 2.3s
✔ Container proxy2-nginx-exporter Started 2.0s
Error response from daemon: could not find a network matching network mode jf5y7525s7qqt0333lfolwruk: network jf5y7525s7qqt0333lfolwruk not found
[
{
"Name": "ai",
"Id": "jf5y7525s7qqt0333lfolwruk",
"Created": "2024-10-06T20:26:15.848600039Z",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.3.0/24",
"Gateway": "10.0.3.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": null,
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4099"
},
"Labels": null
}
]
The network is there.
services:
proxy2:
image: nginx:latest
container_name: proxy2
restart: unless-stopped
networks: ['ai', 'collaboration', 'core', 'garage', 'health', 'iot', 'olivetin', 'media', 'metrics', 'proxy', 'security', 'sprinklers']
ports:
- 443:443
volumes:
- /containers/proxy/nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- /containers/proxy/nginx/conf.d:/etc/nginx/conf.d:ro
- /containers/proxy/dhparams.pem:/etc/ssl/dhparams.pem:ro
- /certs/delchampsio/fullchain.pem:/etc/ssl/delchampsio/fullchain.pem:ro
- /certs/delchampsio/privkey.pem:/etc/ssl/delchampsio/privkey.pem:ro
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
proxy2-nginx-exporter:
image: nginx/nginx-prometheus-exporter:latest
container_name: proxy2-nginx-exporter
restart: unless-stopped
ports:
- 9113:9113
command:
- --nginx.scrape-uri=http://proxy2:8080/nginx_status
networks:
ai:
name: ai
driver: overlay
external: true
collaboration:
name: collaboration
driver: overlay
external: true
core:
name: core
driver: overlay
external: true
garage:
name: garage
driver: overlay
external: true
health:
name: health
driver: overlay
external: true
iot:
name: iot
driver: overlay
external: true
olivetin:
name: olivetin
driver: overlay
external: true
media:
name: media
driver: overlay
external: true
metrics:
name: metrics
driver: overlay
external: true
proxy:
name: proxy
driver: overlay
external: true
security:
name: security
driver: overlay
external: true
sprinklers:
name: sprinklers
driver: overlay
external: true
If I run the following and get a container up and running on that "missing" network, I can get the container started with compose
docker run -dit --rm --name dummy-network-container --network ai alpine
Let me know if you need more info or want me to try something, I'm happy to help out and work on getting this fixed.
@tuxthepenguin84 could you please give binary from https://github.com/docker/compose/pull/12233 a try (binaries available on https://github.com/docker/compose/actions/runs/11513518822, at bottom) ?
This adds some debugs to the network resolution logic that will help diagnose this issue
run as docker compose --verbose --progress=plain up
Thanks I'll try that out and report back.
@ndeloof I have the issue with the compose plugin version v2.27.0 running on Ubuntu Server 24.04 with ARM Arch
Here is the output of testing the binary from #12233
/etc/salt/docker/test # /etc/salt/docker/docker-compose-linux-aarch64 --verbose --progress=plain up -d
DEBU[0000] search network "axel5" by name returned: 0
DEBU[0000] search network "axel5" by ID succeeded
DEBU[0000] networks matching name "axel5" after strict filtering: 0
DEBU[0000] no match, swarm is enabled: true
Container test-dummy-1 Recreate
DEBU[0005] otel error error="<nil>"
Container test-dummy-1 Recreated
Container test-dummy-1 Starting
Container test-dummy-1 Started
DEBU[0010] otel error error="<nil>"
DEBU[0010] otel error error="<nil>"
This version properly creates the network
Here is my docker info output
/etc/salt/docker/test # docker info
Client:
Version: 26.1.5
Context: default
Debug Mode: false
Plugins:
compose: Docker Compose (Docker Inc.)
Version: v2.27.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 11
Running: 6
Paused: 0
Stopped: 5
Images: 13
Server Version: 27.3.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: mi4aclsip2vfc0fmdk0lizvoi
Is Manager: false
Node Address: 172.31.41.5
Manager Addresses:
172.31.45.225:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 57f17b0a6295a39009d861b89e3b3b87b005ca27
runc version: v1.1.14-0-g2c9f560
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.8.0-1016-aws
Operating System: Ubuntu 24.04.1 LTS
OSType: linux
Architecture: aarch64
CPUs: 4
Total Memory: 7.582GiB
Name: ip-172-31-41-5
ID: aebad7d3-d242-435a-a215-9e10a8a1a6b1
Docker Root Dir: /var/lib/docker
Debug Mode: false
Labels:
salt-minion=dd6de55b-6f41-4cfd-924f-1231ed03995b
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
will try with the latest and report
My issue was that I have 2 versions of docker compose:
- version 2.29 in Ubuntu Server on the host
- version 2.27 in Alpine Linux for a container with the docker.sock bind mounted I run my compose commands inside the alpine container with the compose cli version 2.27 because that's the version that ships with alpine 3.20
I fix it by installing the latest from edge like this:
apk add docker-cli docker-cli-compose --repository=https://dl-cdn.alpinelinux.org/alpine/edge/community
I'm having the same problem in 27.4.1(compose: v2.32.1 ), on Ubuntu 24.04. I have the same version on both servers: manager and worker. Docker info:(The workaround with docker run, works)
root@slave1:~# docker info
Client: Docker Engine - Community
Version: 27.4.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.19.3
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.32.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 6
Running: 5
Paused: 0
Stopped: 1
Images: 3
Server Version: 27.4.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: vstjpmppd48cw2x1c89zbnc83
Is Manager: false
Node Address: 10.0.0.2
Manager Addresses:
10.0.0.3:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 88bf19b2105c8b17560993bee28a01ddc2f97182
runc version: v1.2.2-0-g7cb3632
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.8.0-51-generic
Operating System: Ubuntu 24.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.25GiB
Name: slave1
ID: 03d3b31a-0c9d-4f1d-8123-71cb83b32caf
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Tested with latest release:
Ran DinD "nodes" to simulate a swarm cluster:
$ docker run -it --privileged --rm -e DOCKER_TLS_CERTDIR="" -p 50000:2375 --name swarm1 docker:dind
..
$ docker run -it --privileged --rm -e DOCKER_TLS_CERTDIR="" -p 50001:2375 --name swarm2 docker:dind
..
$ DOCKER_HOST=tcp://localhost:50000 docker swarm init
Swarm initialized: current node (wic07oj5hn3q57rsqzlv6uwqw) is now a manager.
..
$ DOCKER_HOST=tcp://localhost:50001 docker swarm join --token SWMTKN-1-010d661apb452us6eps7o2w8dchc3id12vjvgdj4yrkqx6ada2-dkby32mvvpjoe6v3kh5td3rxy 172.17.0.2:2377
This node joined a swarm as a worker.
Created a test overlay network:
$ DOCKER_HOST=tcp://localhost:50000 docker network create --driver=overlay --attachable test
Then used a minimal compose file to test network discoverability:
services:
test:
image: nginx
networks:
- test
networks:
test:
external: true
Was able to run this compose app on worker node, whenever this one doesn't know overlay network until a container attaches
$ DOCKER_HOST=tcp://localhost:50001 docker network inspect test
[]
Error response from daemon: network test not found
$ DOCKER_HOST=tcp://localhost:50001 docker compose up
[+] Running 8/8
✔ test Pulled 11.0s
✔ f6eaf43e06b3 Pull complete 9.1s
[+] Running 1/1
✔ Container truc-test-1 Created 0.2s
Attaching to test-1
test-1 | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
test-1 | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
...
I'm closing this issue as "fixed". If needed, please open a fresh new issue with a reproduction example