kubernetes-neo4j
kubernetes-neo4j copied to clipboard
ERROR Failed to start Neo4j
Running in the neo4j namespace, and using storageClassName: rook-ceph-block
:
$ kubectl -n neo4j get pv
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
datadir-neo4j-core-0 Bound pvc-0912d181-d8a7-11e8-926a-069c6b71a13e 25Gi RWO rook-ceph-block 18m
datadir-neo4j-core-1 Bound pvc-119a3765-d8a7-11e8-926a-069c6b71a13e 25Gi RWO rook-ceph-block 18m
datadir-neo4j-core-2 Bound pvc-1da11b36-d8a7-11e8-926a-069c6b71a13e 25Gi RWO rook-ceph-block 18m
$ kubectl -n neo4j get pods
NAME READY STATUS RESTARTS AGE
neo4j-core-0 2/2 Running 2 12m
neo4j-core-1 2/2 Running 2 12m
neo4j-core-2 2/2 Running 2 12m
Starting Neo4j.
2018-10-25 22:53:15.784+0000 INFO ======== Neo4j 3.3.6 ========
2018-10-25 22:53:15.817+0000 INFO Starting...
2018-10-25 22:53:17.325+0000 INFO Bolt enabled on 0.0.0.0:7687.
2018-10-25 22:53:17.335+0000 INFO Initiating metrics...
2018-10-25 22:53:17.493+0000 INFO Resolved initial host 'neo4j.default.svc.cluster.local:5000' to []
2018-10-25 22:53:17.521+0000 INFO My connection info: [
Discovery: listen=0.0.0.0:5000, advertised=neo4j-core-2.neo4j.neo4j.svc.cluster.local:5000,
Transaction: listen=0.0.0.0:6000, advertised=neo4j-core-2.neo4j.neo4j.svc.cluster.local:6000,
Raft: listen=0.0.0.0:7000, advertised=neo4j-core-2.neo4j.neo4j.svc.cluster.local:7000,
Client Connector Addresses: bolt://neo4j-core-2.neo4j.neo4j.svc.cluster.local:7687,http://neo4j-core-2.neo4j.neo4j.svc.cluster.local:7474,https://neo4j-core-2.neo4j.neo4j.svc.cluster.local:7473
]
2018-10-25 22:53:17.522+0000 INFO Discovering cluster with initial members: [neo4j.default.svc.cluster.local:5000]
2018-10-25 22:53:17.522+0000 INFO Attempting to connect to the other cluster members before continuing...
2018-10-25 22:58:49.904+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@53499d85' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@53499d85' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@53499d85' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:220)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:111)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:79)
at com.neo4j.server.enterprise.CommercialEntryPoint.main(CommercialEntryPoint.java:22)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@53499d85' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:212)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory, /var/lib/neo4j/data/databases/graph.db
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:211)
at com.neo4j.causalclustering.core.CommercialCoreGraphDatabase.<init>(CommercialCoreGraphDatabase.java:35)
at com.neo4j.causalclustering.core.CommercialCoreGraphDatabase.<init>(CommercialCoreGraphDatabase.java:26)
at com.neo4j.server.enterprise.CommercialNeoServer.lambda$static$0(CommercialNeoServer.java:29)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:88)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.causalclustering.core.state.CoreLife@56e07a08' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:207)
... 10 more
Caused by: java.util.concurrent.TimeoutException: Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.
at org.neo4j.causalclustering.identity.ClusterBinder.bindToCluster(ClusterBinder.java:110)
at org.neo4j.causalclustering.core.state.CoreLife.start0(CoreLife.java:70)
at org.neo4j.kernel.lifecycle.SafeLifecycle.transition(SafeLifecycle.java:124)
at org.neo4j.kernel.lifecycle.SafeLifecycle.start(SafeLifecycle.java:138)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 12 more
2018-10-25 22:58:49.910+0000 INFO Neo4j Server shutdown initiated by request
Resolved initial host 'neo4j.default.svc.cluster.local:5000' to []
appears to be the problem, it should be 'neo4j.neo4j.svc.cluster.local:5000'
Changed: value: "neo4j.default.svc.cluster.local:5000"
-> value: "neo4j.neo4j.svc.cluster.local:5000"
2018-10-25 23:06:00.908+0000 INFO Resolved initial host 'neo4j.neo4j.svc.cluster.local:5000' to [100.96.5.28:5000, 100.96.4.41:5000, 100.96.2.24:5000]
Still a no go:
Starting Neo4j.
2018-10-25 23:05:59.322+0000 INFO ======== Neo4j 3.3.6 ========
2018-10-25 23:05:59.355+0000 INFO Starting...
2018-10-25 23:06:00.763+0000 INFO Bolt enabled on 0.0.0.0:7687.
2018-10-25 23:06:00.772+0000 INFO Initiating metrics...
2018-10-25 23:06:00.908+0000 INFO Resolved initial host 'neo4j.neo4j.svc.cluster.local:5000' to [100.96.5.28:5000, 100.96.4.41:5000, 100.96.2.24:5000]
2018-10-25 23:06:00.935+0000 INFO My connection info: [
Discovery: listen=0.0.0.0:5000, advertised=neo4j-core-2.neo4j.neo4j.svc.cluster.local:5000,
Transaction: listen=0.0.0.0:6000, advertised=neo4j-core-2.neo4j.neo4j.svc.cluster.local:6000,
Raft: listen=0.0.0.0:7000, advertised=neo4j-core-2.neo4j.neo4j.svc.cluster.local:7000,
Client Connector Addresses: bolt://neo4j-core-2.neo4j.neo4j.svc.cluster.local:7687,http://neo4j-core-2.neo4j.neo4j.svc.cluster.local:7474,https://neo4j-core-2.neo4j.neo4j.svc.cluster.local:7473
]
2018-10-25 23:06:00.936+0000 INFO Discovering cluster with initial members: [neo4j.neo4j.svc.cluster.local:5000]
2018-10-25 23:06:00.936+0000 INFO Attempting to connect to the other cluster members before continuing...
2018-10-25 23:11:33.326+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@53499d85' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@53499d85' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@53499d85' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:220)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:111)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:79)
at com.neo4j.server.enterprise.CommercialEntryPoint.main(CommercialEntryPoint.java:22)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@53499d85' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:212)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory, /var/lib/neo4j/data/databases/graph.db
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:211)
at com.neo4j.causalclustering.core.CommercialCoreGraphDatabase.<init>(CommercialCoreGraphDatabase.java:35)
at com.neo4j.causalclustering.core.CommercialCoreGraphDatabase.<init>(CommercialCoreGraphDatabase.java:26)
at com.neo4j.server.enterprise.CommercialNeoServer.lambda$static$0(CommercialNeoServer.java:29)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:88)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.causalclustering.core.state.CoreLife@56e07a08' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:207)
... 10 more
Caused by: java.util.concurrent.TimeoutException: Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.
at org.neo4j.causalclustering.identity.ClusterBinder.bindToCluster(ClusterBinder.java:110)
at org.neo4j.causalclustering.core.state.CoreLife.start0(CoreLife.java:70)
at org.neo4j.kernel.lifecycle.SafeLifecycle.transition(SafeLifecycle.java:124)
at org.neo4j.kernel.lifecycle.SafeLifecycle.start(SafeLifecycle.java:138)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 12 more
2018-10-25 23:11:33.332+0000 INFO Neo4j Server shutdown initiated by request
This is mostly a duplicate of #7, however I have no clue what/where this refers to:
I did forget to replace neo4j-core-0.neo4j.default.svc.cluster.local
by `neo4j-core-0.neo4j.neo4j.svc.cluster.local
line 28 of statefulset.yaml contains value: "neo4j.default.svc.cluster.local:5000"
, and I replaced the namespace of default
with neo4j
2018-10-25 23:22:38.654+0000 INFO ======== Neo4j 3.3.6 ========
2018-10-25 23:22:38.687+0000 INFO Starting...
2018-10-25 23:22:40.154+0000 INFO Bolt enabled on 0.0.0.0:7687.
2018-10-25 23:22:40.164+0000 INFO Initiating metrics...
2018-10-25 23:22:40.291+0000 INFO Resolved initial host 'neo4j-core-0.neo4j.neo4j.svc.cluster.local:5000' to [100.96.4.43:5000]
2018-10-25 23:22:40.317+0000 INFO My connection info: [
Discovery: listen=0.0.0.0:5000, advertised=neo4j-core-2.neo4j.neo4j.svc.cluster.local:5000,
Transaction: listen=0.0.0.0:6000, advertised=neo4j-core-2.neo4j.neo4j.svc.cluster.local:6000,
Raft: listen=0.0.0.0:7000, advertised=neo4j-core-2.neo4j.neo4j.svc.cluster.local:7000,
Client Connector Addresses: bolt://neo4j-core-2.neo4j.neo4j.svc.cluster.local:7687,http://neo4j-core-2.neo4j.neo4j.svc.cluster.local:7474,https://neo4j-core-2.neo4j.neo4j.svc.cluster.local:7473
]
2018-10-25 23:22:40.318+0000 INFO Discovering cluster with initial members: [neo4j-core-0.neo4j.neo4j.svc.cluster.local:5000]
2018-10-25 23:22:40.318+0000 INFO Attempting to connect to the other cluster members before continuing...
2018-10-25 23:28:12.707+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@53499d85' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@53499d85' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@53499d85' was successfully initialized, but failed to start. Please see the attached cause exception "Failed to join a cluster with members {clusterId=null, bootstrappable=false, coreMembers={}}. Another member should have published a clusterId but none was detected. Please restart the cluster.".
Deleted the pvc's in case the old data was messing it up:
$ kubectl -n neo4j delete pvc datadir-neo4j-core-0
persistentvolumeclaim "datadir-neo4j-core-0" deleted
$ kubectl -n neo4j delete pvc datadir-neo4j-core-1
persistentvolumeclaim "datadir-neo4j-core-1" deleted
$ kubectl -n neo4j delete pvc datadir-neo4j-core-2
persistentvolumeclaim "datadir-neo4j-core-2" deleted
not it...
I suspect I'm running into an istio routing issue. I appended a prefix of http-
to all the port names... Still a no go.
$ kubectl exec -it neo4j-core-0 bash -n neo4j
Defaulting container name to neo4j.
bash-4.4# ping neo4j-core-0.neo4j.neo4j.svc.cluster.local
PING neo4j-core-0.neo4j.neo4j.svc.cluster.local (100.96.2.29): 56 data bytes
bash-4.4# ping neo4j-core-1.neo4j.neo4j.svc.cluster.local
PING neo4j-core-1.neo4j.neo4j.svc.cluster.local (100.96.4.46): 56 data bytes
bash-4.4# ping neo4j-core-2.neo4j.neo4j.svc.cluster.local
PING neo4j-core-2.neo4j.neo4j.svc.cluster.local (100.96.5.33): 56 data bytes
Added an istio egress rule for dl-cdn.alpinelinux.org
and installed curl on neo4j-core-0
bash-4.4# curl 127.0.0.1:5000
curl: (7) Failed to connect to 127.0.0.1 port 5000: Connection refused
bash-4.4# curl neo4j-core-2.neo4j.neo4j.svc.cluster.local:5000
curl: (56) Recv failure: Connection reset by peer
bash-4.4# curl neo4j-core-1.neo4j.neo4j.svc.cluster.local:5000
curl: (56) Recv failure: Connection reset by peer
bash-4.4# curl neo4j-core-0.neo4j.neo4j.svc.cluster.local:5000
curl: (56) Recv failure: Connection reset by peer
I pretty much followed https://neo4j.com/developer/kb/a-light-weight-approach-to-validating-network-port-connectivity/ - the containers can reach each other on port 5000, but they don't sync.
It appears that NEO4J_causal__clustering_discovery__type
is the old style, it has been replaced with NEO4J_causalClustering_initialDiscoveryMembers
setting:
- name: NEO4J_causalClustering_initialDiscoveryMembers
value: neo4j-core-0.neo4j.neo4j.svc.cluster.local:5000, neo4j-core-1.neo4j.neo4j.svc.cluster.local:5000, neo4j-core-2.neo4j.neo4j.svc.cluster.local:5000
Gets this in the log file of neo4j-core-2:
2018-10-26 22:06:57.188+0000 INFO Resolved initial host 'neo4j-core-0.neo4j.neo4j.svc.cluster.local:5000' to []
2018-10-26 22:06:57.189+0000 INFO Resolved initial host 'neo4j-core-1.neo4j.neo4j.svc.cluster.local:5000' to []
2018-10-26 22:06:57.189+0000 INFO Resolved initial host 'neo4j-core-2.neo4j.neo4j.svc.cluster.local:5000' to [100.96.2.51:5000]
If I restart neo4j-core-2 after the other two are up:
2018-10-26 22:09:37.355+0000 INFO Resolved initial host 'neo4j-core-0.neo4j.neo4j.svc.cluster.local:5000' to [100.96.4.53:5000]
2018-10-26 22:09:37.356+0000 INFO Resolved initial host 'neo4j-core-1.neo4j.neo4j.svc.cluster.local:5000' to [100.96.5.52:5000]
2018-10-26 22:09:37.356+0000 INFO Resolved initial host 'neo4j-core-2.neo4j.neo4j.svc.cluster.local:5000' to [100.96.2.52:5000]
So it looks like I'm using a newer version of neo4j (3.3.6) which has all different env var names ie NEO4J_causal__clustering_initial__discovery__members
-> NEO4J_causalClustering_initialDiscoveryMembers
. Can someone make a version of this conf that works for newer neo4j versions? It is nice to see names with double underscores get fixed, that was really dumb.
Apparently CORE mode doesn't work in kubernetes. It's working just fine in SINGLE mode.
I'm havingthe same problem using the official neo4j helm chart. No success .. isn't the helm chart tested somehow?
Both cluster discovery type and initial cluster members are both needed in order to start the DB in mode=CORE (causal clustering).
The discovery type tells how to discover the other nodes, the initial discovery members is the address used for that discovery.
See here for an example of configuration from a different repo which uses a variant of this chart: https://github.com/neo-technology/neo4j-google-k8s-marketplace/blob/3.5/chart/templates/core-statefulset.yaml#L39
You'll notice that discovery type is set to DNS, meaning that neo4j expects to find a DNS record with multiple A records. And the initial discovery members is a service address that points to the core stateful set. This service address will resolve to multiple A records, one per pod launched.
This repo is unfortunately a bit out of date, but you can find a generic helm chart here: https://github.com/helm/charts/tree/master/stable/neo4j (that has to go through various tests for acceptance) and you can find the google kubernetes marketplace repo here https://github.com/neo-technology/neo4j-google-k8s-marketplace/
windows 10 neo4j-community-5.10.0-windows
start command:neo4j console
Starting Neo4j. Error occurred during initialization of VM Too small maximum heap Neo4j web server failed to start. See log for more info. Run with '--verbose' for a more detailed error message.