charts [bitnami/redis-cluster] Redis-Cluster in Kubernetes (GKE): Readiness probe failed after pod restart. Pod remain in state "running"

Name and Version

bitnami/redis-cluster:8.6.11

What architecture are you using?

None

What steps will reproduce the bug?

In GKE after having successfully deployed the helm chart (redis-cluster:8.6.11 with image bitnami/redis-cluster:7.2.4-debian-11-r0) just restart one pod. The redis pod remains in the state "running" although the restart appears to be succeeded (the pod log means: Ready to accept connections tcp). The GKE just saying that the Readiness probe failed: cluster_state:fail.

Are you using any custom parameters or values?

We generally use the standard values implemented by the chart. We just adapted few settings related to security and resource quotas. This is our values files:

containerSecurityContext:
  enabled: true
  runAsUser: 1001
  runAsNonRoot: true
  privileged: false
  capabilities:
    drop:
      - ALL
  allowPrivilegeEscalation: false
  seccompProfile:
    type: RuntimeDefault

redis:
  resources:
    requests:
      cpu: 100m
      memory: 500Mi
      ephemeral-storage: 1Gi
    limits:
      cpu: 200m
      memory: 750Mi
      ephemeral-storage: 2Gi
persistence:
  enabled: false

What is the expected behavior?

The redis cluster should be resilient against pods restarts as pods in Kubernetes may be restartet due to different reasons!

What do you see instead?

The redis pods remains then in the state "running" and is unreachable although the restart appears to be succeeded (the pod log means: Ready to accept connections tcp). The GKE just saying that the Readiness probe failed: cluster_state:fail.

Additional information

This is the log of the pod after the restart.

redis-cluster 09:17:25.32 INFO  ==> 
redis-cluster 09:17:25.33 INFO  ==> Welcome to the Bitnami redis-cluster container
redis-cluster 09:17:25.33 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
redis-cluster 09:17:25.33 INFO  ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
redis-cluster 09:17:25.33 INFO  ==> 
redis-cluster 09:17:25.33 INFO  ==> ** Starting Redis setup **
redis-cluster 09:17:25.51 INFO  ==> Initializing Redis
redis-cluster 09:17:25.53 INFO  ==> Setting Redis config file
Storing map with hostnames and IPs
redis-cluster 09:17:31.25 INFO  ==> ** Redis setup finished! **

WARNING: Changing databases number from 16 to 1 since we are in cluster mode
oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 15 Feb 2024 09:17:31.423 * Redis version=7.2.4, bits=64, commit=00000000, modified=0, pid=1, just started
Configuration loaded
1:M 15 Feb 2024 09:17:31.423 * monotonic clock: POSIX clock_gettime
                _._                                                  
                                      
      _.-``    `.  `_.  ''-._           Redis 7.2.4 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                  
 (    '      ,       .-`  | `,    )     Running in cluster mode
.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           https://redis.io       
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

1:M 15 Feb 2024 09:17:31.424 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:M 15 Feb 2024 09:17:31.424 * No cluster configuration found, I'm f9b36a66aac595e0d54f6d075bc1fc7dc3b9b99c
1:M 15 Feb 2024 09:17:31.428 * Server initialized
1:M 15 Feb 2024 09:17:31.430 * Creating AOF base file appendonly.aof.1.base.rdb on server start
1:M 15 Feb 2024 09:17:31.433 * Creating AOF incr file appendonly.aof.1.incr.aof on server start
Ready to accept connections tcp

Feb 15 '24 11:02 bakahoui

Hi!

In order to understand where the issue is. Could you try removing the resources section? Just to know if it is an issue with the resources and the readiness probe timeout.

Feb 16 '24 09:02 javsalgar

Thanks @javsalgar for the prompt answer. Removing the resources section (with the resource quotas) is unfortunately not possioble cause it s mandatory...

I got as expected an error during deployment due to the missing requests and limits...

FailedCreate (15) | create Pod holmes-cache-redis-cluster-0 in StatefulSet holmes-cache-redis-cluster failed error: pods "holmes-cache-redis-cluster-0" is forbidden: failed quota: default-t2qz8: must specify limits.cpu for: holmes-cache-redis-cluster; limits.memory for: holmes-cache-redis-cluster; requests.cpu for: holmes-cache-redis-cluster; requests.memory

Feb 16 '24 11:02 bakahoui

Then my advice would be to increase the limits to see if there's a point where the issue does not appear.

Feb 20 '24 08:02 javsalgar

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

Mar 07 '24 01:03 github-actions[bot]

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

Mar 12 '24 01:03 github-actions[bot]