charts icon indicating copy to clipboard operation
charts copied to clipboard

valkey-cluster Readiness probe failed: cluster_state:fail - nodes don't join the cluster

Open arpan57 opened this issue 6 months ago • 1 comments

Name and Version

bitnami/valkey-cluster

What architecture are you using?

None

What steps will reproduce the bug?

  1. On Macbook pro apple silicon, post setting up helm repo - I am trying to run the valkey chart - (valkey-cluster-0.1.8 ) on minikube following the Readme
  2. Command used to install the helmchart - helm install my-release oci://registry-1.docker.io/bitnamicharts/valkey-cluster
  3. It spawned 6 pods .

The pods look like this

k get pods
NAME                          READY   STATUS    RESTARTS      AGE
my-release-valkey-cluster-0   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-1   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-2   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-3   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-4   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-5   0/1     Running   1 (21h ago)   21h

Pod description/events look like following:

❯ k describe pod my-release-valkey-cluster-0
Name:             my-release-valkey-cluster-0

....
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  2m51s (x3257 over 21h)  kubelet  Readiness probe failed: cluster_state:fail


When I tried to connect it using the valkey-cli I notice that it shows only one node (itself) as the part of the cluster.

❯ kubectl exec -it my-release-valkey-cluster-1 -- valkey-cli
127.0.0.1:6379> ping
PONG
127.0.0.1:6379> CLUSTER nodes
c2104e2cd9da1efb779c1c1a82ee40c588fa6a0f 10.244.0.41:6379@16379 myself,master - 0 0 0 connected
127.0.0.1:6379>

The pod logs look like this :

`valkey-cluster 15:27:46.85 INFO  ==> ** Starting Valkey setup **
valkey-cluster 15:27:46.90 INFO  ==> Initializing Valkey
valkey-cluster 15:27:46.95 INFO  ==> Setting Valkey config file
valkey-cluster 15:27:47.15 INFO  ==> Changing old IP 10.244.0.40 by the new one 10.244.0.40
valkey-cluster 15:27:47.20 INFO  ==> Changing old IP 10.244.0.41 by the new one 10.244.0.41
valkey-cluster 15:27:47.30 INFO  ==> Changing old IP 10.244.0.39 by the new one 10.244.0.39
valkey-cluster 15:27:47.40 INFO  ==> Changing old IP 10.244.0.43 by the new one 10.244.0.43
valkey-cluster 15:27:47.45 INFO  ==> Changing old IP 10.244.0.42 by the new one 10.244.0.42
valkey-cluster 15:27:47.50 INFO  ==> Changing old IP 10.244.0.38 by the new one 10.244.0.38

valkey-cluster 15:27:47.50 INFO  ==> ** Valkey setup finished! **
1:C 06 Aug 2024 15:27:47.612 # WARNING: Changing databases number from 16 to 1 since we are in cluster mode
1:C 06 Aug 2024 15:27:47.654 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
1:C 06 Aug 2024 15:27:47.654 * Valkey version=7.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 06 Aug 2024 15:27:47.654 * Configuration loaded
1:M 06 Aug 2024 15:27:47.654 * monotonic clock: POSIX clock_gettime
 
1:M 06 Aug 2024 15:27:47.655 * Node configuration loaded, I'm d9827f7db0ee609373fa6b0d43bc525246c57021
1:M 06 Aug 2024 15:27:47.656 * Server initialized
1:M 06 Aug 2024 15:27:47.656 * Reading RDB base file on AOF loading...
1:M 06 Aug 2024 15:27:47.656 * Loading RDB produced by valkey version 7.2.6
1:M 06 Aug 2024 15:27:47.656 * RDB age 596 seconds
1:M 06 Aug 2024 15:27:47.656 * RDB memory usage when created 1.56 Mb
1:M 06 Aug 2024 15:27:47.656 * RDB is base AOF
1:M 06 Aug 2024 15:27:47.656 * Done loading RDB, keys loaded: 0, keys expired: 0.
1:M 06 Aug 2024 15:27:47.656 * DB loaded from base file appendonly.aof.1.base.rdb: 0.000 seconds
1:M 06 Aug 2024 15:27:47.656 * DB loaded from append only file: 0.000 seconds
1:M 06 Aug 2024 15:27:47.656 * Opening AOF incr file appendonly.aof.1.incr.aof on server start
1:M 06 Aug 2024 15:27:47.656 * Ready to accept connections tcp`

What am I missing? Any guidelines on debugging further?

Thanks.

Are you using any custom parameters or values?

No parameters used. only going with helm install my-release oci://registry-1.docker.io/bitnamicharts/valkey-cluster

What is the expected behavior?

valkey-cluster should be up and pods should be running with ready state 0/1 Using valkey-cli we should be able to list all the nodes

What do you see instead?

k get pods
NAME                          READY   STATUS    RESTARTS      AGE
my-release-valkey-cluster-0   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-1   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-2   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-3   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-4   0/1     Running   1 (21h ago)   21h
my-release-valkey-cluster-5   0/1     Running   1 (21h ago)   21h
❯ kubectl exec -it my-release-valkey-cluster-1 -- valkey-cli
127.0.0.1:6379> CLUSTER nodes
c2104e2cd9da1efb779c1c1a82ee40c588fa6a0f 10.244.0.41:6379@16379 myself,master - 0 0 0 connected

Additional information

No response

arpan57 avatar Aug 07 '24 13:08 arpan57