kubernetes-neo4j icon indicating copy to clipboard operation
kubernetes-neo4j copied to clipboard

Pod crashes after trying to join the cluster

Open nonken opened this issue 7 years ago • 4 comments

First of all, awesome work with this!

I have been trying to get this running on GKE and two of the three pods pretty much consistently start running. The third pod never succeeds in joining the cluster and then crashes.

Here the logs:

2018-01-07 09:53:46.843+0000 INFO  ======== Neo4j 3.3.1 ========
2018-01-07 09:53:46.950+0000 INFO  Starting...
2018-01-07 09:53:52.194+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2018-01-07 09:53:52.248+0000 INFO  Initiating metrics...
2018-01-07 09:53:52.725+0000 INFO  Resolved initial host 'neo4j.default.svc.cluster.local:5000' to [10.28.0.79:5000, 10.28.0.78:5000, 10.28.0.77:5000]
2018-01-07 09:53:52.824+0000 INFO  My connection info: [
	Discovery:   listen=0.0.0.0:5000, advertised=neo4j-core-0.neo4j.default.svc.cluster.local:5000,
	Transaction: listen=0.0.0.0:6000, advertised=neo4j-core-0.neo4j.default.svc.cluster.local:6000, 
	Raft:        listen=0.0.0.0:7000, advertised=neo4j-core-0.neo4j.default.svc.cluster.local:7000, 
	Client Connector Addresses: bolt://neo4j-core-0.neo4j.default.svc.cluster.local:7687,http://neo4j-core-0.neo4j.default.svc.cluster.local:7474,https://neo4j-core-0.neo4j.default.svc.cluster.local:7473
]
2018-01-07 09:53:52.827+0000 INFO  Discovering cluster with initial members: [neo4j.default.svc.cluster.local:5000]
2018-01-07 09:53:52.830+0000 INFO  Attempting to connect to the other cluster members before continuing...
Exception in thread "HZ Starting Thread" java.lang.IllegalStateException: Node failed to start!
	at com.hazelcast.instance.HazelcastInstanceImpl.<init>(HazelcastInstanceImpl.java:132)
	at com.hazelcast.instance.HazelcastInstanceFactory.constructHazelcastInstance(HazelcastInstanceFactory.java:218)
	at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:176)
	at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:126)
	at com.hazelcast.core.Hazelcast.newHazelcastInstance(Hazelcast.java:58)
	at org.neo4j.causalclustering.discovery.HazelcastCoreTopologyService.createHazelcastInstance(HazelcastCoreTopologyService.java:263)
	at org.neo4j.causalclustering.discovery.HazelcastCoreTopologyService.lambda$start$0(HazelcastCoreTopologyService.java:137)
	at java.lang.Thread.run(Thread.java:748)

Has anyone ran into this or has hints on how to further debug?

nonken avatar Jan 07 '18 10:01 nonken

I'm having a similar issue when trying to model this on Minikube, per instructions on the readme file. While my crash logs indicate a "failed to publish" exception, the symptom is the same: only 1 or 2 of 3 instances ever run correctly. I have the following addons enabled: storage-provisioner, kube-dns, default-storageclass, dashboard (and I tried enabling coredns with no luck). Anyone having any luck with this?

Jiropole avatar Feb 02 '18 15:02 Jiropole

I upgraded to 3.3.4 and just started seeing this again.

nonken avatar Apr 02 '18 20:04 nonken

@nonken Did you able to figure out how to solve this?

rakeshpatri avatar Sep 10 '18 09:09 rakeshpatri

@rakeshpatri I moved away from Kubernetes, it seemed like using barebone EC2 is simpler.

nonken avatar Sep 24 '18 20:09 nonken