hazelcast-kubernetes
hazelcast-kubernetes copied to clipboard
1.5.1: Cluster never forming
Running 1.5.1
My cluster is never forming, no node is joining itself and this process is just going on forever
All nodes can see each other as reported by LowestAddressJoinDecider
. All the logged "discovered" IPs/ports are correct. The plugin appears to be talking to the k8s master fine AND getting back the correct list of pods/ips given my configured selectors.
From inside each pod/container I can manually curl
each listed node @ 8552/bootstrap/seed-nodes and get back the following:
{"seedNodes":[],"selfNode":"akka.tcp://[email protected]:2552"}
This just goes on and on and on, and not one of these 4 nodes ever joins itself, which they should from what I understand. It would seem at least one of the nodes would join itself, no?
I also notice this:
Exceeded stable margins but missing seed node information from some contact points
Which I assume implies that a few of the nodes cannot be contacted (there are 6 nodes total, 2 of which don't expose 8552 intentionally), yet required-contact-point-nr: 1
...
My cluster config is like this: missing something?
" akka.management {\n" +
" cluster.bootstrap {\n" +
" contact-point-discovery {\n" +
" discovery-method = kubernetes-api\n" +
" required-contact-point-nr: 1\n"+
" resolve-timeout = 10 seconds\n"+
" interval = 2 second\n"+
" stable-margin = 5 seconds\n"+
" exponential-backoff-random-factor = 0.5\n"+
" }\n"+
" contact-point {\n"+
" probing-failure-timeout: 5 seconds\n"+
" probe-interval = 2 second\n"+
" probe-interval-jitter = 0.5\n"+
" }\n"+
" }\n" +
" }\n" +
" akka.discovery {\n" +
" kubernetes-api {\n" +
" class = akka.discovery.kubernetes.KubernetesApiServiceDiscovery\n" +
" pod-namespace = \""+k8Namespace+"\"\n" +
" pod-label-selector = \""+k8PodSelector+"\"\n" +
" }\n" +
" }\n";
So this seems to be a bug to me:
-
I have 6 pods, all have the same pod labels referenced by
hazelcast-kubernetes
for itspod-label-selector
, but only 4 of those pods run akka (2552/8552) -
I have
required-contact-point-nr: 1
-
All 6 node's IPs are properly discovered by
kubernetes-api
via the k8s master -
4 of the node nodes respond fine to
http://podip:8552/bootstrap/seed-nodes
and return no seed nodes. 2 of the nodes are connection refused. -
Yet despite
required-contact-point-nr: 1
, no nodes join themselves, it just goes on and on forever.
Seems to me that even if the discovery mechanism is aware of 6 potential seed node endpoints and some subset of that number is unreachable
, yet a majority of them are reachable AND required-contact-point-nr: [< number of reachable nodes]
....... one of the nodes should still join itself so things can move forward.
What is this akka.discovery.kubernetes.KubernetesApiServiceDiscovery
class? Could you write the steps to reproduce without any frameworks? That would make the issue simpler to analyze.
? what is it? its a class provided by akka... https://github.com/akka/akka-management/blob/master/discovery-kubernetes-api/src/main/resources/reference.conf#L10