kubernetes-neo4j Pod crashes after trying to join the cluster

First of all, awesome work with this!

I have been trying to get this running on GKE and two of the three pods pretty much consistently start running. The third pod never succeeds in joining the cluster and then crashes.

Here the logs:

2018-01-07 09:53:46.843+0000 INFO  ======== Neo4j 3.3.1 ========
2018-01-07 09:53:46.950+0000 INFO  Starting...
2018-01-07 09:53:52.194+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2018-01-07 09:53:52.248+0000 INFO  Initiating metrics...
2018-01-07 09:53:52.725+0000 INFO  Resolved initial host 'neo4j.default.svc.cluster.local:5000' to [10.28.0.79:5000, 10.28.0.78:5000, 10.28.0.77:5000]
2018-01-07 09:53:52.824+0000 INFO  My connection info: [
	Discovery:   listen=0.0.0.0:5000, advertised=neo4j-core-0.neo4j.default.svc.cluster.local:5000,
	Transaction: listen=0.0.0.0:6000, advertised=neo4j-core-0.neo4j.default.svc.cluster.local:6000, 
	Raft:        listen=0.0.0.0:7000, advertised=neo4j-core-0.neo4j.default.svc.cluster.local:7000, 
	Client Connector Addresses: bolt://neo4j-core-0.neo4j.default.svc.cluster.local:7687,http://neo4j-core-0.neo4j.default.svc.cluster.local:7474,https://neo4j-core-0.neo4j.default.svc.cluster.local:7473
]
2018-01-07 09:53:52.827+0000 INFO  Discovering cluster with initial members: [neo4j.default.svc.cluster.local:5000]
2018-01-07 09:53:52.830+0000 INFO  Attempting to connect to the other cluster members before continuing...
Exception in thread "HZ Starting Thread" java.lang.IllegalStateException: Node failed to start!
	at com.hazelcast.instance.HazelcastInstanceImpl.<init>(HazelcastInstanceImpl.java:132)
	at com.hazelcast.instance.HazelcastInstanceFactory.constructHazelcastInstance(HazelcastInstanceFactory.java:218)
	at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:176)
	at com.hazelcast.instance.HazelcastInstanceFactory.newHazelcastInstance(HazelcastInstanceFactory.java:126)
	at com.hazelcast.core.Hazelcast.newHazelcastInstance(Hazelcast.java:58)
	at org.neo4j.causalclustering.discovery.HazelcastCoreTopologyService.createHazelcastInstance(HazelcastCoreTopologyService.java:263)
	at org.neo4j.causalclustering.discovery.HazelcastCoreTopologyService.lambda$start$0(HazelcastCoreTopologyService.java:137)
	at java.lang.Thread.run(Thread.java:748)

Has anyone ran into this or has hints on how to further debug?

Jan 07 '18 10:01 nonken

I'm having a similar issue when trying to model this on Minikube, per instructions on the readme file. While my crash logs indicate a "failed to publish" exception, the symptom is the same: only 1 or 2 of 3 instances ever run correctly. I have the following addons enabled: storage-provisioner, kube-dns, default-storageclass, dashboard (and I tried enabling coredns with no luck). Anyone having any luck with this?

Feb 02 '18 15:02 Jiropole

I upgraded to 3.3.4 and just started seeing this again.

Apr 02 '18 20:04 nonken

@nonken Did you able to figure out how to solve this?

Sep 10 '18 09:09 rakeshpatri

@rakeshpatri I moved away from Kubernetes, it seemed like using barebone EC2 is simpler.

Sep 24 '18 20:09 nonken

kubernetes-neo4j kubernetes-neo4j copied to clipboard

Pod crashes after trying to join the cluster

kubernetes-neo4j
kubernetes-neo4j copied to clipboard