netplugin icon indicating copy to clipboard operation
netplugin copied to clipboard

No healthy node available in the cluster. -- Issue with swarm

Open gaurav-dalvi opened this issue 7 years ago • 9 comments

We are using swarm : 1.2.0 Steps to reproduce:

get 2 or 3 baremetal nodes of centos 7 run net_demo_installer script then use this document https://github.com/contiv/netplugin/blob/master/test/systemtests/How-to-Run.md to trigger our system-tests

Error log:

NFO[0428] Starting netmaster on swarm-baremetal-node1
INFO[0430] Starting netmaster on swarm-baremetal-node2
INFO[0437] Starting a container running "sleep 60m" on swarm-baremetal-node2
INFO[0437] Starting a container running "sleep 60m" on swarm-baremetal-node1
INFO[0437] cmd "docker run -itd --name=private-srv0-0-1 --net=private-srv0-0   contiv/alpine sleep 60m" failed: output below
INFO[0437] docker: Error response from daemon: No healthy node available in the cluster.
See 'docker run --help'.

INFO[0437] cmd "docker run -itd --name=private-srv0-1-0 --net=private-srv0-1   contiv/alpine sleep 60m" failed: output below
INFO[0437] docker: Error response from daemon: No healthy node available in the cluster.
See 'docker run --help'.

ERRO[0437] Container id "docker: Error response from daemon: No healthy node available in the cluster.\nSee 'docker run --help'." is invalid
ERRO[0437] Container id "docker: Error response from daemon: No healthy node available in the cluster.\nSee 'docker run --help'." is invalid
INFO[0439] ============================= systemtestSuite.TestPolicyBasicVXLAN completed ==========================

----------------------------------------------------------------------
FAIL: policy_test.go:13: systemtestSuite.TestPolicyBasicVXLAN

policy_test.go:14:
    s.testPolicyBasic(c, "vxlan")
policy_test.go:89:
    c.Assert(err, IsNil)
... value *ssh.ExitError = &ssh.ExitError{Waitmsg:ssh.Waitmsg{status:127, signal:"", msg:"", lang:""}} ("Process exited with: 127. Reason was:  ()")

INFO[0439] Cleaning up containers on swarm-baremetal-node1
INFO[0439] Cleaning up containers on swarm-baremetal-node2
INFO[0440] Checking for errors on swarm-baremetal-node1
ERRO[0440] Errors in logfiles on swarm-baremetal-node1:

grep: /tmp/net*: No such file or directory

==========================

gaurav-dalvi avatar Dec 15 '16 01:12 gaurav-dalvi

Netplugin Log : https://gist.github.com/gaurav-dalvi/7389da18f09677949707f825e3e17216

Netmaster Log : https://gist.github.com/gaurav-dalvi/d12556cd1a0aa145b323d5f7f6edd085

gaurav-dalvi avatar Dec 15 '16 01:12 gaurav-dalvi

Duplicate of https://github.com/contiv/netplugin/issues/652?

jojimt avatar Dec 19 '16 03:12 jojimt

https://gist.github.com/gaurav-dalvi/e308414fad9b29baeae5fd0bd21d4ab3

@jojimt @gkvijay : Could you please take a look ? I am seeing this issue for long time now. This happens only on baremetal / VM testing and not on Vagrant VMs.

Docker swarm 1.2.5

gaurav-dalvi avatar Jan 24 '17 06:01 gaurav-dalvi

https://gist.github.com/gaurav-dalvi/5d51641ba4e43aad7d1ae1002ed8c3d4 netmaster logs

gaurav-dalvi avatar Jan 24 '17 06:01 gaurav-dalvi

Please give the docker version and the output of 'docker info' from swarm.

gkvijay avatar Jan 24 '17 07:01 gkvijay

its docker 1.11

Docker swarm output seemed to be fine. I dont have tht testbed to give it to you.

gaurav-dalvi avatar Jan 24 '17 20:01 gaurav-dalvi

any update on this one @gkvijay

gaurav-dalvi avatar Feb 02 '17 18:02 gaurav-dalvi

@gaurav-dalvi Please close this issue if you are not seeing it now

gkvijay avatar Jun 12 '17 18:06 gkvijay

I was getting the same issue until I redeployed the swarm and increased number of agents to "2" and it worked for me !!!

Rockyjee avatar Feb 28 '18 00:02 Rockyjee