runtime icon indicating copy to clipboard operation
runtime copied to clipboard

swarm not compatible with latest docker

Open GabyCT opened this issue 8 years ago • 6 comments

It is not possible to retrieve the hostname of a swarm replica using cc-runtime 3.0.4 with docker 17.06-ce as it leaves the replicas in "create" state.

$docker swarm init Swarm initialized: current node (cl0wp2btl2e2gt0yaeh33m2og) is now a manager.

$ docker service create --name testswarm --replicas 1 --publish 8080:80 mcastelino/nginx /bin/bash -c "hostname > /usr/share/nginx/html/hostname; nginx -g "daemon off;"" 2> /dev/null 10oehfg3hn2byxv2emb3z8ukg

$docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b017d21f121c mcastelino/nginx:latest "/bin/bash -c 'hostna" 2 minutes ago Created testswarm.1.3jjt16h597dko31tjgb8pfegy

Setup: cc-runtime : 3.0.4 commit : 28bd75d OCI specs: 1.0.0-rc5

GabyCT avatar Oct 27 '17 20:10 GabyCT

However, this behaviour does not occurr when we are using docker-engine 1.12.1

GabyCT avatar Oct 27 '17 20:10 GabyCT

@amshinde @GabyCT @jodh-intel just some early debug info.

The docker swarm networking implementation has changed quite a bit. The service IP assignment has moved from the container interface to the loopback interface. Also in my case I could not get some of the tests to work even with runc. I will look at all the docker swarm changes in detail and come up with a set of changes. However this will involve some work from our end.

So we should qualify which version of swarm we support yet again till we fix all the issues.

mcastelino avatar Oct 30 '17 16:10 mcastelino

For the record, I see the following curl behaviour on Ubuntu 17.04:

  • docker 17.05.0-ce: hangs.
  • docker 1.12.6-0ubuntu4: connection refused.
  • docker 1.12.1-0ubuntu15: works.

jodh-intel avatar Nov 01 '17 13:11 jodh-intel

Although with 1.12.1 the curl works, qemu and cc-proxy processes are left behind.

fuentess@test-swarm:~/go/src/github.com/clearcontainers/tests$ ps -ef | grep -c qemu
17
fuentess@test-swarm:~/go/src/github.com/clearcontainers/tests$ ps -ef | grep -c cc-pr
18

Also cc-runtime list stops working:

fuentess@test-swarm:~/go/src/github.com/clearcontainers/tests$ sudo cc-runtime list
stat /var/lib/docker/overlay2/beb512b966cf76f45e8a6d36a49c5d9ae9278ac7dfcdf548bf9db52e0e4875bf/merged: no such file or directory

On cc-runtime log:

Dec 28 18:10:20 test-swarm cc-runtime[86681]: time="2017-12-28T18:10:20Z" level=error msg="Container still running, should be stopped" source=runtime

On cc-proxy log:

Dec 28 18:10:20 test-swarm cc-proxy[86113]: time="2017-12-28T18:10:20.033990165Z" level=error msg="error serving client: write unix /run/virtcontainers/pods/4aca4486137fca02d7b02ea73cc3d0261b34e9ed8000af51564c4a
6713a8905f/proxy.sock->@: write: broken pipe" client=2 name=cc-proxy pid=86113 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86113]: time="2017-12-28T18:10:20.034216762Z" level=info msg="connection closed" client=2 name=cc-proxy pid=86113 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86113]: time="2017-12-28T18:10:20.050816354Z" level=info msg="client connected" client=5 name=cc-proxy pid=86113 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86113]: time="2017-12-28T18:10:20.051177949Z" level=info msg="AttachVM(containerId=4aca4486137fca02d7b02ea73cc3d0261b34e9ed8000af51564c4a6713a8905f)" client=5 name=cc-proxy pi
d=86113 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86113]: time="2017-12-28T18:10:20.052367934Z" level=info msg="hyper(cmd=\\\"killcontainer\\\", data=\\\"{\\\\\\\"container\\\\\\\":\\\\\\\"4aca4486137fca02d7b02ea73cc3d0261b34
e9ed8000af51564c4a6713a8905f\\\\\\\",\\\\\\\"signal\\\\\\\":9,\\\\\\\"allProcesses\\\\\\\":true}\\\")" client=5 name=cc-proxy pid=86113 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86113]: time="2017-12-28T18:10:20.078413208Z" level=error msg="error writing I/O data to client: write unix /run/virtcontainers/pods/4aca4486137fca02d7b02ea73cc3d0261b34e9ed80
00af51564c4a6713a8905f/proxy.sock->@: use of closed network connection" name=cc-proxy pid=86113 section=io source=proxy vm=4aca4486137fca02d7b02ea73cc3d0261b34e9ed8000af51564c4a6713a8905f
Dec 28 18:10:20 test-swarm cc-proxy[86113]: time="2017-12-28T18:10:20.084839327Z" level=info msg="hyper(cmd=\\\"removecontainer\\\", data=\\\"{\\\\\\\"container\\\\\\\":\\\\\\\"4aca4486137fca02d7b02ea73cc3d0261b
34e9ed8000af51564c4a6713a8905f\\\\\\\"}\\\")" client=5 name=cc-proxy pid=86113 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86113]: time="2017-12-28T18:10:20.189627114Z" level=info msg="connection closed" client=5 name=cc-proxy pid=86113 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86094]: time="2017-12-28T18:10:20.514057147Z" level=info msg="Signal(killed,0,0)" client=2 name=cc-proxy pid=86094 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86094]: time="2017-12-28T18:10:20.514138646Z" level=info msg="waiting for runtime to execute the process for token hLxW8hejRP1Us5w0UB0kJRowv6VZrtN0XaERKi3V4LE= (timeout 30s)" 
name=cc-proxy pid=86094 section=io source=proxy vm=ef34fe9a2798081739c102ee232e557cb8a8929b7b320310d897ad13759a8529
Dec 28 18:10:20 test-swarm cc-proxy[86094]: time="2017-12-28T18:10:20.519005485Z" level=error msg="error serving client: write unix /run/virtcontainers/pods/ef34fe9a2798081739c102ee232e557cb8a8929b7b320310d897ad
13759a8529/proxy.sock->@: write: broken pipe" client=2 name=cc-proxy pid=86094 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86094]: time="2017-12-28T18:10:20.519236982Z" level=info msg="connection closed" client=2 name=cc-proxy pid=86094 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86094]: time="2017-12-28T18:10:20.543561977Z" level=info msg="client connected" client=5 name=cc-proxy pid=86094 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86094]: time="2017-12-28T18:10:20.543722675Z" level=info msg="AttachVM(containerId=ef34fe9a2798081739c102ee232e557cb8a8929b7b320310d897ad13759a8529)" client=5 name=cc-proxy pi
d=86094 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86094]: time="2017-12-28T18:10:20.544061971Z" level=info msg="hyper(cmd=\\\"killcontainer\\\", data=\\\"{\\\\\\\"container\\\\\\\":\\\\\\\"ef34fe9a2798081739c102ee232e557cb8a8
929b7b320310d897ad13759a8529\\\\\\\",\\\\\\\"signal\\\\\\\":9,\\\\\\\"allProcesses\\\\\\\":true}\\\")" client=5 name=cc-proxy pid=86094 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86094]: time="2017-12-28T18:10:20.559092982Z" level=info msg="hyper(cmd=\\\"removecontainer\\\", data=\\\"{\\\\\\\"container\\\\\\\":\\\\\\\"ef34fe9a2798081739c102ee232e557cb8
a8929b7b320310d897ad13759a8529\\\\\\\"}\\\")" client=5 name=cc-proxy pid=86094 source=proxy
Dec 28 18:10:20 test-swarm cc-proxy[86094]: time="2017-12-28T18:10:20.563611626Z" level=error msg="error writing I/O data to client: write unix /run/virtcontainers/pods/ef34fe9a2798081739c102ee232e557cb8a8929b7b
320310d897ad13759a8529/proxy.sock->@: use of closed network connection" name=cc-proxy pid=86094 section=io source=proxy vm=ef34fe9a2798081739c102ee232e557cb8a8929b7b320310d897ad13759a8529
Dec 28 18:10:20 test-swarm cc-proxy[86094]: time="2017-12-28T18:10:20.661964793Z" level=info msg="connection closed" client=5 name=cc-proxy pid=86094 source=proxy

chavafg avatar Dec 28 '17 18:12 chavafg

This sometimes leads to not being able to start the replicas:

ok 2 # skip (This is not working (https://github.com/clearcontainers/tests/issues/694)) check_service_ip_among_the_replicas
not ok 3 obtain hostname of the replicas
# (in test file integration/swarm/swarm.bats, line 83)
#   `REPLICAS[$i]="$(curl --connect-timeout $timeout --retry $number_of_retries $url)"' failed with status 7
# Swarm initialized: current node (8t65ureewbkxty8s06r26urnv) is now a manager.
# 
# To add a worker to this swarm, run the following command:
# 
#     docker swarm join \
#     --token SWMTKN-1-39nw7h9kj2uldqdixzn3fmw2n1irbxox9gi85b1xipjhcpcq5z-4trwiu6und6sktoy3cgo2nouc \
#     10.1.6.4:2377
# 
# To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
# 
# 6uat0eob3dpqau6lqkg4ltijn
# ID            NAME       REPLICAS  IMAGE         COMMAND
# 6uat0eob3dpq  testswarm  0/4       gabyct/nginx  sh -c hostname > /usr/share/nginx/html/hostname; nginx -g "daemon off;"
# ID            NAME       REPLICAS  IMAGE         COMMAND
# 6uat0eob3dpq  testswarm  0/4       gabyct/nginx  sh -c hostname > /usr/share/nginx/html/hostname; nginx -g "daemon off;"
# ID            NAME       REPLICAS  IMAGE         COMMAND
# 6uat0eob3dpq  testswarm  0/4       gabyct/nginx  sh -c hostname > /usr/share/nginx/html/hostname; nginx -g "daemon off;"
# ID            NAME       REPLICAS  IMAGE         COMMAND
# 6uat0eob3dpq  testswarm  0/4       gabyct/nginx  sh -c hostname > /usr/share/nginx/html/hostname; nginx -g "daemon off;"
# ID            NAME       REPLICAS  IMAGE         COMMAND
# 6uat0eob3dpq  testswarm  0/4       gabyct/nginx  sh -c hostname > /usr/share/nginx/html/hostname; nginx -g "daemon off;"
# ID            NAME       REPLICAS  IMAGE         COMMAND
# 6uat0eob3dpq  testswarm  0/4       gabyct/nginx  sh -c hostname > /usr/share/nginx/html/hostname; nginx -g "daemon off;"
# ID            NAME       REPLICAS  IMAGE         COMMAND
# 6uat0eob3dpq  testswarm  0/4       gabyct/nginx  sh -c hostname > /usr/share/nginx/html/hostname; nginx -g "daemon off;"
# ID            NAME       REPLICAS  IMAGE         COMMAND
# 6uat0eob3dpq  testswarm  0/4       gabyct/nginx  sh -c hostname > /usr/share/nginx/html/hostname; nginx -g "daemon off;"
# ID            NAME       REPLICAS  IMAGE         COMMAND
# 6uat0eob3dpq  testswarm  0/4       gabyct/nginx  sh -c hostname > /usr/share/nginx/html/hostname; nginx -g "daemon off;"
# ID            NAME       REPLICAS  IMAGE         COMMAND
# 6uat0eob3dpq  testswarm  0/4       gabyct/nginx  sh -c hostname > /usr/share/nginx/html/hostname; nginx -g "daemon off;"
#   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
#                                  Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0curl: (7) Failed to connect to 127.0.0.1 port 8080: No route to host
# testswarm
# Node left the swarm.

chavafg avatar Dec 28 '17 18:12 chavafg

@mcastelino - A belated update to this issue... We do now specify the version of Docker Swarm we support here:

  • https://github.com/clearcontainers/runtime/blob/master/docs/limitations.md#docker-swarm-support

jodh-intel avatar Jan 02 '18 12:01 jodh-intel