Agents Stuck in loop on Docker For Windows
Im attempting to run a linux container in Docker For Windows with a kubernetes arango cluster. It starts up but gets stuck in an endless loop of the agents starting in error,terminating,then initializing. I also noticed that the load balancer I have is set to local host instead of the IP to hit the pod from an external source. Im not quite sure what im doing wrong. Running the same commands on a linux machine works fine. Any help would be appreciated. Let me know if you need more info. Logs & screen shots below.
Here is the Yaml to deploy the cluster:
Here is a Screen shot of the termination:
Here is them re-initializing:
The load balancer service:
and here is the describe command on the deployment:
Name: arango-cluster
Namespace: default
Labels:
Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-fgfmyrni added to deployment Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-co8m3gqw added to deployment Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-z8htk4xo added to deployment Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-32xzf09n added to deployment Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-cdvjutsj added to deployment Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-gdxuutwg added to deployment Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-sxgd2w6y added to deployment Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-to1bm0c3 added to deployment Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-bnq8tiee added to deployment Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-32xzf09n-128dfb of member dbserver is created Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-cdvjutsj-128dfb of member dbserver is created Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-gdxuutwg-128dfb of member dbserver is created Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-fgfmyrni-128dfb of member coordinator is created Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-sxgd2w6y-128dfb of member coordinator is created Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-to1bm0c3-128dfb of member coordinator is created Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-co8m3gqw-128dfb of member agent is created Normal Pod Of Agent Gone 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-z8htk4xo-128dfb of member agent is gone Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-bnq8tiee-128dfb of member agent is created Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-z8htk4xo-128dfb of member agent is created Normal Pod Of Agent Gone 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-co8m3gqw-128dfb of member agent is gone Normal Pod Of Agent Gone 36h (x6 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-bnq8tiee-128dfb of member agent is gone
If possible please provide the logs of an terminating agent.
I have similar problem. I create cluster on k8s (3 worker nodes cluster) and one Agent is in restart loop.

Operator logs :
2019-11-08T15:26:48Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm 2019-11-08T15:26:49Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm 2019-11-08T15:26:50Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 2019-11-08T15:26:50Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false 2019-11-08T15:26:51Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm 2019-11-08T15:26:52Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm 2019-11-08T15:26:52Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 2019-11-08T15:26:52Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false 2019-11-08T15:26:53Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm 2019-11-08T15:26:54Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm 2019-11-08T15:26:55Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 2019-11-08T15:26:55Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false
logs from restarting Pod:
2019-11-08T16:02:17Z [1] INFO [e52b0] ArangoDB 3.5.0 [linux] 64bit, using jemalloc, build tags/v3.5.0-0-gc42dbe8547, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.0k 28 May 2019 2019-11-08T16:02:17Z [1] INFO [75ddc] detected operating system: Linux version 3.10.0-1062.4.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Oct 18 17:15:30 UTC 2019 2019-11-08T16:02:17Z [1] WARNING [118b0] {memory} maximum number of memory mappings per process is 262144, which seems too low. it is recommended to set it to at least 512000 2019-11-08T16:02:17Z [1] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=512000"' 2019-11-08T16:02:17Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/enabled is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise' 2019-11-08T16:02:17Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/defrag is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise' 2019-11-08T16:02:17Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"' 2019-11-08T16:02:17Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/defrag"' 2019-11-08T16:02:17Z [1] DEBUG [63a7a] host ASLR is in use for shared libraries, stack, mmap, VDSO, heap and memory managed through brk() 2019-11-08T16:02:17Z [1] DEBUG [713c0] {authentication} Not creating user manager 2019-11-08T16:02:17Z [1] DEBUG [71a76] {authentication} Setting jwt secret of size 64 2019-11-08T16:02:17Z [1] INFO [144fe] using storage engine rocksdb 2019-11-08T16:02:17Z [1] INFO [3bb7d] {cluster} Starting up with role AGENT 2019-11-08T16:02:17Z [1] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 1048576, soft limit is 1048576 2019-11-08T16:02:17Z [1] DEBUG [f6e04] {config} using default language 'en_US' 2019-11-08T16:02:17Z [1] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on 2019-11-08T16:02:17Z [1] DEBUG [f6e04] {config} using default language 'en_US' 2019-11-08T16:02:23Z [1] INFO [e6460] created base application directory '/var/lib/arangodb3-apps/_db' 2019-11-08T16:02:23Z [1] INFO [6ea38] using endpoint 'http+tcp://[::]:8529' for non-encrypted requests 2019-11-08T16:02:23Z [1] DEBUG [dc45a] bound to endpoint 'http+tcp://[::]:8529' 2019-11-08T16:02:23Z [1] INFO [cf3f4] ArangoDB (version 3.5.0 [linux]) is ready for business. Have fun! 2019-11-08T16:02:23Z [1] INFO [d7476] {agency} Restarting agent from persistence ... 2019-11-08T16:02:23Z [1] INFO [d96f6] {agency} Found active RAFTing agency lead by AGNT-axody2ec. Finishing startup sequence. 2019-11-08T16:02:23Z [1] INFO [fe299] {agency} Constituent::update: setting _leaderID to 'AGNT-axody2ec' in term 9 2019-11-08T16:02:23Z [1] INFO [79fd7] {agency} Activating agent. 2019-11-08T16:02:23Z [1] INFO [29175] {agency} Setting role to follower in term 9 2019-11-08T16:02:29Z [1] INFO [aefab] {agency} AGNT-abtwb2ag: candidating in term 9 2019-11-08T16:02:29Z [1] DEBUG [74339] accept failed: Operation canceled 2019-11-08T16:02:30Z [1] INFO [4bcb9] ArangoDB has been shut down
Limit memory for Agent pod is set to 2Gi
same issue.. anyone found a solution for it?
it started when i tried to set the dbservers.count to 0 and now its stuck in a restart loop and no matter what i do i can't get it to stop