hazelcast-kubernetes icon indicating copy to clipboard operation
hazelcast-kubernetes copied to clipboard

1.5.1: Cluster never forming

Open bitsofinfo opened this issue 5 years ago • 3 comments

Running 1.5.1

My cluster is never forming, no node is joining itself and this process is just going on forever

All nodes can see each other as reported by LowestAddressJoinDecider. All the logged "discovered" IPs/ports are correct. The plugin appears to be talking to the k8s master fine AND getting back the correct list of pods/ips given my configured selectors.

From inside each pod/container I can manually curl each listed node @ 8552/bootstrap/seed-nodes and get back the following:

{"seedNodes":[],"selfNode":"akka.tcp://[email protected]:2552"}

This just goes on and on and on, and not one of these 4 nodes ever joins itself, which they should from what I understand. It would seem at least one of the nodes would join itself, no?

I also notice this: Exceeded stable margins but missing seed node information from some contact points

Which I assume implies that a few of the nodes cannot be contacted (there are 6 nodes total, 2 of which don't expose 8552 intentionally), yet required-contact-point-nr: 1...

My cluster config is like this: missing something?

"	akka.management {\n" +
			      "  	  cluster.bootstrap {\n" +
				  "	    		contact-point-discovery {\n" +
				  "	      		discovery-method = kubernetes-api\n" +
				  "             required-contact-point-nr: 1\n"+
			      "				resolve-timeout = 10 seconds\n"+	
			      "				interval = 2 second\n"+	
			      "             stable-margin = 5 seconds\n"+
			      "				exponential-backoff-random-factor = 0.5\n"+
			      "			}\n"+	
			      " 			contact-point {\n"+
			      "				probing-failure-timeout: 5 seconds\n"+
			      "				probe-interval = 2 second\n"+
			      "				probe-interval-jitter = 0.5\n"+		
			      " 			}\n"+
				  "  	 }\n" +
				  "	}\n" +
				  
				  " akka.discovery {\n" +
				  "	  	kubernetes-api {\n" +
				  "         class = akka.discovery.kubernetes.KubernetesApiServiceDiscovery\n" +
				  "			pod-namespace = \""+k8Namespace+"\"\n" +
				  "			pod-label-selector = \""+k8PodSelector+"\"\n" +
				  "		}\n" +
				  "  }\n";

bitsofinfo avatar Aug 02 '19 14:08 bitsofinfo

So this seems to be a bug to me:

  1. I have 6 pods, all have the same pod labels referenced by hazelcast-kubernetes for its pod-label-selector, but only 4 of those pods run akka (2552/8552)

  2. I have required-contact-point-nr: 1

  3. All 6 node's IPs are properly discovered by kubernetes-api via the k8s master

  4. 4 of the node nodes respond fine to http://podip:8552/bootstrap/seed-nodes and return no seed nodes. 2 of the nodes are connection refused.

  5. Yet despite required-contact-point-nr: 1, no nodes join themselves, it just goes on and on forever.

Seems to me that even if the discovery mechanism is aware of 6 potential seed node endpoints and some subset of that number is unreachable, yet a majority of them are reachable AND required-contact-point-nr: [< number of reachable nodes] ....... one of the nodes should still join itself so things can move forward.

bitsofinfo avatar Aug 02 '19 15:08 bitsofinfo

What is this akka.discovery.kubernetes.KubernetesApiServiceDiscovery class? Could you write the steps to reproduce without any frameworks? That would make the issue simpler to analyze.

leszko avatar Feb 04 '20 14:02 leszko

? what is it? its a class provided by akka... https://github.com/akka/akka-management/blob/master/discovery-kubernetes-api/src/main/resources/reference.conf#L10

bitsofinfo avatar Feb 06 '20 13:02 bitsofinfo