[BUG] sporadic failure in the setup of container networking: overlay network not found during container initialization
Description
When defining two networks one of which is an overlay network (the host is initialized as swarm manager) and assigning it to a service in the docker compose file, the start is sporadically aborted with the following error message:
Error response from daemon: failed to set up container networking: could not find a network matching network mode <overlay-network-name>: network <overlay-network-name> not found
Expected behaviour should be that the service and networking definition is created everytime without error.
Steps To Reproduce
Using the following minimal working example the error message can be reproduced every once in a while (Note I don't know if the driver_opts is necessary, but it is what we used in our production environment where we noticed the error):
services:
nginx:
image: nginx:latest
networks:
- net
- second-net
networks:
net:
driver: overlay
attachable: true
name: net
external: false
driver_opts:
encrypted: "true"
second-net:
name: second-net
Executing docker compose up will result in the following error once in a while:
[+] Running 2/3
✔ Network net Created 0.0s
✔ Network second-net Created 0.1s
⠸ Container debug-nginx-1 Starting 0.3s
Error response from daemon: failed to set up container networking: could not find a network matching network mode net: network net not found
Because the error seems to appear only sporadically I wrote a simple script to perform the same actions everytime:
# Enter your advertise-addr here
ADVERTISE_ADDR="x.x.x.x"
docker swarm leave --force >/dev/null 2>&1
docker swarm init --advertise-addr "$ADVERTISE_ADDR" >/dev/null 2>&1
while true; do
docker compose down -v >/dev/null 2>&1 && docker compose down -v >/dev/null 2>&1
output=$(docker compose up -d --force-recreate 2>&1)
if error_output=$(echo "$output" | grep "Error"); then
echo
echo $error_output
echo
else
echo
echo "Everything OK"
echo
fi
done
This will result in an output which looks like this:
Everything OK
Error response from daemon: failed to set up container networking: could not find a network matching network mode net: network net not found
Everything OK
Everything OK
Everything OK
Error response from daemon: failed to set up container networking: could not find a network matching network mode net: network net not found
Everything OK
Error response from daemon: failed to set up container networking: could not find a network matching network mode net: network net not found
Everything OK
Compose Version
Docker Compose version 2.36.0
Docker Environment
Client:
Version: 28.1.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: 0.23.0
Path: /usr/lib/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: 2.36.0
Path: /usr/lib/docker/cli-plugins/docker-compose
Server:
Containers: 6
Running: 4
Paused: 0
Stopped: 2
Images: 17
Server Version: 28.1.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: true
Native Overlay Diff: false
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: wdkuxge4ny157zriyaq1r0c8i
Is Manager: true
ClusterID: vcsfsuzdr2xqe2w89p0skqty5
Managers: 1
Nodes: 1
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 192.168.178.126
Manager Addresses:
192.168.178.126:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 061792f0ecf3684fb30a3a0eb006799b8c6638a7.m
runc version:
init version: de40ad0
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.14.6-arch1-1
Operating System: Arch Linux
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 31.24GiB
Name: YST
ID: BETC:KIM3:OQXZ:CPL5:5KAO:FVML:5XOD:TDAA:KJLA:4MAV:DYE6:F5SL
Docker Root Dir: /var/lib/docker
Debug Mode: false
Username: ysautter
Experimental: false
Insecure Registries:
::1/128
127.0.0.0/8
Live Restore Enabled: false
Anything else?
The issue also occures with docker compose version 2.35.1
Can you reproduce when docker swarm is disable ?
No because I can not create an overlay network when the host is not a swarm manager. Starting the docker compose when docker swarm is disabled results in the following error:
Network net Error failed to create network net: Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.
Noticed something weird :
$ docker compose -f overlay.yaml down -v
[+] Running 3/3
✔ Container truc-nginx-1 Removed 0.2s
✔ Network second-net Removed 0.2s
! Network net Resource is still in use 0.0s
# Let's try again
$ docker compose -f overlay.yaml down -v
[+] Running 1/1
✔ Network net Removed
This demonstrates a delay exists with overlay network in swarm mode between container removal and network being considered unused. I assume some asynchrony takes place as the swarm cluster manager is replicating state between nodes.
Your issue is a comparable one with overlay network being create and a container attached within a very short delay, which randomly triggers "network net not found" error
Docker Compose can't mange such an unpredictable behavior. Docker engine's NetworkCreate should not require client to "wait a few" before network can actually be attached by a container. Please open an issue on github.com/moby/moby
I suspect this is because of the way swarm-managed networks are deployed dynamically, but @robmry may be able to fill me in.
Non-swarm networks are created on the node where the command is executed. For swarm-networks, creating a network only creates the "definition" of the network, but doesn't create the actual network on all nodes in the cluster. The actual network is created when a service "tasK" (container backing a swarm service) is scheduled to be deployed on a specific node. By default, such networks cannot be used by non-swarm containers (this was an initial security constraint to only allow managed services from accessing the network, as swarm cluster nodes (workers) are designed with least-privilege). The --attachable option was added to allow access to the network from non-swarm containers (for (e.g.) debugging purposes to allow running a one-off container on a node that connects to the network), but that feature still depends on the network to be rolled-out by swarm.
@ndeloof, @thaJeztah ... that all sounds plausible - @corhere, I think you're looking at issues in this area at the moment?
Noticed something weird :
$ docker compose -f overlay.yaml down -v [+] Running 3/3 ✔ Container truc-nginx-1 Removed 0.2s ✔ Network second-net Removed 0.2s ! Network net Resource is still in use 0.0s # Let's try again $ docker compose -f overlay.yaml down -v [+] Running 1/1 ✔ Network net RemovedThis demonstrates a delay exists with overlay network in swarm mode between container removal and network being considered unused. I assume some asynchrony takes place as the swarm cluster manager is replicating state between nodes.
Your issue is a comparable one with overlay network being create and a container attached within a very short delay, which randomly triggers "network net not found" error
Docker Compose can't mange such an unpredictable behavior. Docker engine's
NetworkCreateshould not require client to "wait a few" before network can actually be attached by a container. Please open an issue on github.com/moby/moby
This is also, why the script I wrote executed, docker compose down -v two times.
Non-swarm networks are created on the node where the command is executed. For swarm-networks, creating a network only creates the "definition" of the network, but doesn't create the actual network on all nodes in the cluster. The actual network is created when a service "tasK" (container backing a swarm service) is scheduled to be deployed on a specific node. By default, such networks cannot be used by non-swarm containers (this was an initial security constraint to only allow managed services from accessing the network, as swarm cluster nodes (workers) are designed with least-privilege). The --attachable option was added to allow access to the network from non-swarm containers (for (e.g.) debugging purposes to allow running a one-off container on a node that connects to the network), but that feature still depends on the network to be rolled-out by swarm.
Unfortunately, I am not able to solve my needs only with docker swarm services and needed a more flexible solution to orchestrate distributed containers. The docker swarm overlay network seemed like the perfect fit for my needs, as the containers can communicate even if they are not running on the same host. Simply attaching the container to the overlay network until it is created seems to work, but feels rather hacky.
Although, the documentation also states that without the --attachable flag non-swarm containers are not able to join an overlay network, I was not aware until now that the --attachable flag is more meant for debugging. On the other hand I would argue that the swarm overlay network is a powerful tool when implemented with a more predictable behavior for non-swarm containers.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I seem to be having the same issue. Has anyone found any resolution or made progress?
This issue has been automatically marked as not stale anymore due to the recent activity.
I seem to have found a workaround. I also have initialized my swam with --advertise--addr [manager-IP]
If I use docker swarm join --token [XXXXXX] [manager-IP]:2377 --advertise-addr [node-ip] --listen-addr [node-ip]
Things seem to be working now.