libnetwork
libnetwork copied to clipboard
Failure during overlay endpoints restore: restore network sandbox failed: network sandbox join failed: could not get network sandbox (oper true): failed get network namespace \"\": no such file or directory
i use consul create overlay network,when i reboot slaver docker,the container con't boot.the log is:
an 10 17:27:38 docker systemd: Stopped Docker Application Container Engine.
Jan 10 17:27:38 docker systemd: Closed Docker Socket for the API.
Jan 10 17:28:35 docker systemd: Starting Docker Socket for the API.
Jan 10 17:28:35 docker systemd: Listening on Docker Socket for the API.
Jan 10 17:28:35 docker systemd: Starting Docker Application Container Engine...
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.851796952+08:00" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.857206117+08:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.857232538+08:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.857298978+08:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.857310322+08:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.857410658+08:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0 <nil>}]" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.857437934+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.857432691+08:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0 <nil>}]" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.857481142+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.857542797+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4201a5d50, CONNECTING" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.857548281+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420866390, CONNECTING" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.857808502+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4201a5d50, READY" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.859888418+08:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420866390, READY" module=grpc
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.869570799+08:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.892104466+08:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.892138947+08:00" level=info msg="Initializing discovery without TLS"
Jan 10 17:28:35 docker dockerd: time="2019-01-10T17:28:35.893062919+08:00" level=info msg="Loading containers: start."
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.202705906+08:00" level=warning msg="Failure during overlay endpoints restore: restore network sandbox failed: network sandbox join failed: could not get network sandbox (oper true): failed get network namespace \"\": no such file or directory"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.412281152+08:00" level=info msg="2019/01/10 17:28:36 [INFO] serf: EventMemberJoin: docker.huitu 10.0.0.214\n"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.413599205+08:00" level=info msg="2019/01/10 17:28:36 [INFO] serf: EventMemberJoin: ywzx-h 10.0.0.213\n"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.465316150+08:00" level=error msg="getNetworkFromStore for nid 40bef7593c70ff24c510ea511e773c4123381ba229ed5f3b2043e9aca470c281 failed while trying to build sandbox for cleanup: network 40bef7593c70ff24c510ea511e773c4123381ba229ed5f3b2043e9aca470c281 not found"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.466916720+08:00" level=error msg="getEndpointFromStore for eid ad95c56226a60412db1382c4e17bbe07813ce0be502a63da9539a1246dfe0a20 failed while trying to build sandbox for cleanup: could not find endpoint ad95c56226a60412db1382c4e17bbe07813ce0be502a63da9539a1246dfe0a20: []"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.466999665+08:00" level=info msg="Removing stale sandbox cf0585e050bfcca6a58795877fc6c616e28517fda0ce6997e0ea62ea689eb5ff (d4f740a7bc7bb44d60dc35d2e49bc185d81b46a8daad2d9c49509b5423bfc4df)"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.467376450+08:00" level=warning msg="Failed getting network for ep 0112b3ab98b1d8e020f9d955dbe441a9647f603f934ef6b5e72d6b13c0a63779 during sandbox cf0585e050bfcca6a58795877fc6c616e28517fda0ce6997e0ea62ea689eb5ff delete: network 40bef7593c70ff24c510ea511e773c4123381ba229ed5f3b2043e9aca470c281 not found"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.468508461+08:00" level=warning msg="Failed deleting endpoint ad95c56226a60412db1382c4e17bbe07813ce0be502a63da9539a1246dfe0a20: failed to get endpoint from store during Delete: could not find endpoint ad95c56226a60412db1382c4e17bbe07813ce0be502a63da9539a1246dfe0a20: []\n"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.468556810+08:00" level=error msg="Failed to delete sandbox cf0585e050bfcca6a58795877fc6c616e28517fda0ce6997e0ea62ea689eb5ff while trying to cleanup: could not cleanup all the endpoints in container d4f740a7bc7bb44d60dc35d2e49bc185d81b46a8daad2d9c49509b5423bfc4df / sandbox cf0585e050bfcca6a58795877fc6c616e28517fda0ce6997e0ea62ea689eb5ff"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.539422395+08:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.754866274+08:00" level=error msg="0027a2ca3ae6c74aa64e98276d92f4929246b671578d4a456a5e1789d8861dda cleanup: failed to delete container from containerd: no such container"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.754903084+08:00" level=error msg="Failed to start container 0027a2ca3ae6c74aa64e98276d92f4929246b671578d4a456a5e1789d8861dda: network sandbox join failed: network sandbox join failed: could not get network sandbox (oper true): failed get network namespace \"\": no such file or directory"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.757931202+08:00" level=error msg="c158fdacf7b090efbc490ef797c4180997a769f79fdd5172e20c3309298ed718 cleanup: failed to delete container from containerd: no such container"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.757960363+08:00" level=error msg="Failed to start container c158fdacf7b090efbc490ef797c4180997a769f79fdd5172e20c3309298ed718: network sandbox join failed: network sandbox join failed: could not get network sandbox (oper true): failed get network namespace \"\": no such file or directory"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.793220555+08:00" level=error msg="7efe5e326effc3a1a2d54958b20a59f162d45260f8c3968391efc6614d4016dd cleanup: failed to delete container from containerd: no such container"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.793262289+08:00" level=error msg="Failed to start container 7efe5e326effc3a1a2d54958b20a59f162d45260f8c3968391efc6614d4016dd: network sandbox join failed: network sandbox join failed: could not get network sandbox (oper true): failed get network namespace \"\": no such file or directory"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.803230494+08:00" level=error msg="24cecebf52d2962cc388ab3dca47b3e6a9229a304f19e32a60c8404ffade64b9 cleanup: failed to delete container from containerd: no such container"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.803259685+08:00" level=error msg="Failed to start container 24cecebf52d2962cc388ab3dca47b3e6a9229a304f19e32a60c8404ffade64b9: network sandbox join failed: network sandbox join failed: could not get network sandbox (oper true): failed get network namespace \"\": no such file or directory"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.811000316+08:00" level=error msg="2aebb2cb08b7bcf1554c665d8425c2507ff76eb98a85dce5cfee4ffb9ef377aa cleanup: failed to delete container from containerd: no such container"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.811033651+08:00" level=error msg="Failed to start container 2aebb2cb08b7bcf1554c665d8425c2507ff76eb98a85dce5cfee4ffb9ef377aa: network sandbox join failed: network sandbox join failed: could not get network sandbox (oper true): failed get network namespace \"\": no such file or directory"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.811069988+08:00" level=info msg="Loading containers: done."
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.861053493+08:00" level=info msg="Docker daemon" commit=bca0068 graphdriver(s)=overlay2 version=18.09.1-rc1
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.861179167+08:00" level=info msg="Daemon has completed initialization"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.864091530+08:00" level=warning msg="Could not register builder git source: failed to find git binary: exec: \"git\": executable file not found in $PATH"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.870439152+08:00" level=info msg="API listen on [::]:2375"
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.870500757+08:00" level=info msg="API listen on /var/run/docker.sock"
Jan 10 17:28:36 docker systemd: Started Docker Application Container Engine.
Jan 10 17:28:36 docker dockerd: time="2019-01-10T17:28:36.871330446+08:00" level=info msg="API listen on /var/run/docker.sock"
I got the same problem with docker 18.04. version from 27 feb. Cannot connect my containers to the overlay network.
network sandbox join failed: network sandbox join failed: could not get network sandbox (oper true): failed get network namespace "": no such file or directory"
Not very nice. It's a production environment 36 customers disconnected.
It happend after the provider has restartet the server without normal shutdown.
Error response from daemon: network sandbox join failed: network sandbox join failed: could not get network sandbox (oper true): failed get network namespace "": no such file or directory Error: failed to start containers: ew-engine-XXX
same issue with 18.09.7, using overlay network after restart
docker: Error response from daemon: network sandbox join failed: network sandbox join failed: could not get network sandbox (oper true): failed get network namespace "": no such file or directory.
Same issue with 19.03.1. After a normal restart of the server.
A workaround for me. First remove cluster-store
and cluster-advertise
in the daemon.json and restart the docker
service. Then add them back and restart docker
again. It works for me now.
We found out, there are different scenarious how this situation happens. All of them include engines which are turned off not gracefully or which are stopped for a long time where the virtual network was changing during that time.
Our latest solution is to first disconnect the engine from the overlay networks, then start it and try to reconnect to the network.
docker network disconnect NETWORKNAME ENGINENAME
docker start ENGINENAME
docker network connect NETWORKNAME ENGINENAME
This is running through most of the times. Only sometimes it says 'cannot connect' ... which means infact that the virtual network configuration is somehow broken (the driver for virtual network vnet and docker).
We tried for many hours to fix the network problem in a running system, with out disturbing our users to much. At the end we found a workarround by using multiple parallel overlay-networks.
Whenever we get a problem with one overlay network, we keep the existing overlay-networks (with the existing engines in that network) and create a new overlay network joining new engines to that network. Our main proxy which is responsible to forward requests to the engines, simply is part of all networks.
To cleanup the system we can plan a maintenance cycle, where we have to shutdown the complete system. Clean all "consul" data bases and restart the complete docker environment on all machines. This will take 2 hours, so we do that maybe once a year.
Using the multiple overlay network has relaxed the situation, we have always a running system and we can easily move machines between the networks.
The three technologies: virtual network, docker and consul, are not fully stable if system is changing:
- one server of the cluster is turned off (by accident, by error on system ... what ever)
- docker engines are stopped for a longer period and restarted after that (system tries to re establish connections and fails).
Hope this information helps others which are facing the problem to keep a docker system running :-). I'm definitly a fan of that architecture, but you are lost when it comes to virtual networks (but on the other side, it is so amazingly easy to use).
Regards, André
Same error.
5 docker node with etcd and overlay network. After restart 2 docker node machines. We get following error.
dockerd[556]: time="2019-09-21T13:50:57.472551212+08:00" level=error msg="Handler for POST /v1.40/containers/f6d1aa3c3aec407677928d2ab2bb4609
fd1e42c0bc0726ae04fd13fa5e851b0f/start returned error: network sandbox join failed: network sandbox join failed: could not get network sandbox (oper true): failed
get network namespace \"\": no such file or directory"
Client: Docker Engine - Community
Version: 19.03.2
API version: 1.40
Go version: go1.12.8
Git commit: 6a30dfc
Built: Thu Aug 29 05:28:55 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.2
API version: 1.40 (minimum version 1.12)
Go version: go1.12.8
Git commit: 6a30dfc
Built: Thu Aug 29 05:27:34 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
A workaround for me. First remove
cluster-store
andcluster-advertise
in the daemon.json and restart thedocker
service. Then add them back and restartdocker
again. It works for me now.
This solution worked for me.
A workaround for me. First remove
cluster-store
andcluster-advertise
in the daemon.json and restart thedocker
service. Then add them back and restartdocker
again. It works for me now.
worked for me
A workaround for me. First remove
cluster-store
andcluster-advertise
in the daemon.json and restart thedocker
service. Then add them back and restartdocker
again. It works for me now.
worked for me.
Docker team, Is this issue fixed as part of some other issue? We recently migrated from docker 17 to docker 19.3.12. This is causing us a lot of trouble in cloud deployment where host evacuation is pretty common.
Docker team, Can you point to the latest docker version where this issue doesn't exist?
After a node reboot i was facing the same issue. The workaround from @himacro worked for me aswell.
Anyone has more insights regarding this issue ?