charts icon indicating copy to clipboard operation
charts copied to clipboard

[bitnami/redis] Fix redis sentinel timeout

Open nobiit opened this issue 1 year ago • 7 comments

Description of the change

Fixed redis sentinel waiting until timeout and exiting

Benefits

Some CNI will keep connection to service until 1 pods are available

Possible drawbacks

None

Applicable issues

On some CNI (Cilium) it will keep the connection to the Service until timeout, it makes redis sentinel boot longer than usual and is killed by kubernetes

Additional information

On 1 test machine (Created with kind)

$ kubectl logs redis-node-0 sentinel 
 02:43:22.99 INFO  ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD redis-cli -h redis.default.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
Could not connect to Redis at redis.default.svc.cluster.local:26379: Connection refused
1:X 15 Jul 2022 02:43:25.074 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 15 Jul 2022 02:43:25.074 # Redis version=7.0.2, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 15 Jul 2022 02:43:25.074 # Configuration loaded
1:X 15 Jul 2022 02:43:25.075 * monotonic clock: POSIX clock_gettime
1:X 15 Jul 2022 02:43:25.076 * Running mode=sentinel, port=26379.
1:X 15 Jul 2022 02:43:25.076 # Sentinel ID is 2a09ba7abbb41ee71e79087310d75f9809c3c815
1:X 15 Jul 2022 02:43:25.076 # +monitor master mymaster redis-node-0.redis-headless.default.svc.cluster.local 6379 quorum 2

On my machine (using CNI Cilium)

$ kubectl logs -f redis-node-0 sentinel 
 08:08:14.43 INFO  ==> Found previous master redis-node-0.redis-headless.db.svc.cluster.local in /opt/bitnami/redis-sentinel/etc/sentinel.conf
 08:08:14.43 INFO  ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD redis-cli -h redis.db.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
$ kubectl describe -n db pods redis-node-0 
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  12m                  default-scheduler  Successfully assigned db/redis-node-0 to k8s-01
  Normal   Pulled     11m                  kubelet            Container image "[docker.io/bitnami/bitnami-shell:11-debian-11-r14]" already present on machine
  Normal   Created    11m                  kubelet            Created container volume-permissions
  Normal   Started    11m                  kubelet            Started container volume-permissions
  Normal   Pulled     11m                  kubelet            Container image "[docker.io/bitnami/redis:7.0.3-debian-11-r0]" already present on machine
  Normal   Created    11m                  kubelet            Created container redis
  Normal   Started    11m                  kubelet            Started container redis
  Normal   Pulled     11m                  kubelet            Container image "[docker.io/bitnami/redis-sentinel:7.0.2-debian-11-r9]" already present on machine
  Normal   Created    11m                  kubelet            Created container sentinel
  Normal   Started    11m                  kubelet            Started container sentinel
  Warning  Unhealthy  10m (x8 over 11m)    kubelet            Startup probe failed: dial tcp 10.192.0.122:6379: connect: connection refused
  Warning  Unhealthy  112s (x52 over 11m)  kubelet            Startup probe failed: dial tcp 10.192.0.122:26379: connect: connection refused

Checklist

  • [x] Chart version bumped in Chart.yaml according to semver. This is not necessary when the changes only affect README.md files.
  • [x] Variables are documented in the values.yaml and added to the README.md using readme-generator-for-helm
  • [x] Title of the pull request follows this pattern [bitnami/<name_of_the_chart>] Descriptive title
  • [x] All commits signed off and in agreement of Developer Certificate of Origin (DCO)

nobiit avatar Jul 15 '22 08:07 nobiit

Please accept this PR if this PR is accepted https://github.com/bitnami/bitnami-docker-redis-sentinel/pull/51

nobiit avatar Jul 15 '22 08:07 nobiit

Thanks @nobiit, I'll pass this PR to one of our engineers to follow up, but there is something it looks weird to me, we are not providing netcat in our images. How did you test it?

fmulero avatar Jul 15 '22 08:07 fmulero

Hey @fmulero , I created 1 PR to docker redis to add netcat (https://github.com/bitnami/bitnami-docker-redis-sentinel/pull/51). I can't find any other way in the current image

nobiit avatar Jul 15 '22 10:07 nobiit

Please don't do that, netcat is the swiss army knife to open undesired connections. Please have a look to libnet.sh or use something like:

$ timeout 1 bash -c 'cat < /dev/null > /dev/tcp/google.com/80'

fmulero avatar Jul 15 '22 14:07 fmulero

Thanks @fmulero for the suggestion, I've been looking for such a command and can't find it. I will update the PR again tomorrow

nobiit avatar Jul 15 '22 15:07 nobiit

Hey @fmulero, I updated the PR and tested again in my environment. It works pretty well, thank you Please review the PR, or wait for someone else to have the same problem if you have any doubts

$ kubectl logs -f -n db redis-node-0 sentinel 
 02:56:34.37 INFO  ==> Found previous master redis-node-0.redis-headless.db.svc.cluster.local in /opt/bitnami/redis-sentinel/etc/sentinel.conf
 02:56:35.38 WARN  ==> Timeout when connecting to redis.db.svc.cluster.local, will use redis-headless.db.svc.cluster.local
 02:56:35.38 INFO  ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD redis-cli -h redis-headless.db.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
Could not connect to Redis at redis-headless.db.svc.cluster.local:26379: Connection refused
1:X 16 Jul 2022 02:56:35.480 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 16 Jul 2022 02:56:35.480 # Redis version=7.0.2, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 16 Jul 2022 02:56:35.480 # Configuration loaded
1:X 16 Jul 2022 02:56:35.481 * monotonic clock: POSIX clock_gettime
1:X 16 Jul 2022 02:56:35.481 * Running mode=sentinel, port=26379.
1:X 16 Jul 2022 02:56:35.482 # Sentinel ID is 2a09ba7abbb41ee71e79087310d75f9809c3c815
1:X 16 Jul 2022 02:56:35.482 # +monitor master mymaster redis-node-0.redis-headless.db.svc.cluster.local 6379 quorum 2
$ kubectl logs -f -n db redis-node-1 sentinel 
 03:01:01.17 INFO  ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD redis-cli -h redis.db.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
redis-node-0.redis-headless.db.svc.cluster.local
6379
 03:01:01.18 INFO  ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD redis-cli -h redis.db.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
 03:01:01.18 INFO  ==> printing REDIS_SENTINEL_INFO=(redis-node-0.redis-headless.db.svc.cluster.local,6379)
1:X 16 Jul 2022 03:01:01.270 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 16 Jul 2022 03:01:01.270 # Redis version=7.0.2, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 16 Jul 2022 03:01:01.270 # Configuration loaded
1:X 16 Jul 2022 03:01:01.270 * monotonic clock: POSIX clock_gettime
1:X 16 Jul 2022 03:01:01.271 * Running mode=sentinel, port=26379.
1:X 16 Jul 2022 03:01:01.272 # Sentinel ID is 33535e4e17bf8f9f9ff9ce8f9ddf609e558ff4f2
1:X 16 Jul 2022 03:01:01.272 # +monitor master mymaster redis-node-0.redis-headless.db.svc.cluster.local 6379 quorum 2
1:X 16 Jul 2022 03:01:01.274 * +slave slave redis-node-1.redis-headless.db.svc.cluster.local:6379 redis-node-1.redis-headless.db.svc.cluster.local 6379 @ mymaster redis-node-0.redis-headless.db.svc.cluster.local 6379
1:X 16 Jul 2022 03:01:01.367 * Sentinel new configuration saved on disk
$ kubectl logs -f -n db redis-node-0 redis 
 02:56:33.89 INFO  ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD timeout 220 redis-cli -h redis.db.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
 03:00:13.89 INFO  ==> Found previous master redis-node-0.redis-headless.db.svc.cluster.local:6379 in /opt/bitnami/redis-sentinel/etc/sentinel.conf
 03:00:13.90 INFO  ==> Configuring the node as master
1:C 16 Jul 2022 03:00:13.910 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 16 Jul 2022 03:00:13.910 # Redis version=7.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 16 Jul 2022 03:00:13.910 # Configuration loaded
1:M 16 Jul 2022 03:00:13.910 * monotonic clock: POSIX clock_gettime
1:M 16 Jul 2022 03:00:13.911 * Running mode=standalone, port=6379.
1:M 16 Jul 2022 03:00:13.911 # Server initialized
1:M 16 Jul 2022 03:00:13.916 * Reading RDB base file on AOF loading...
1:M 16 Jul 2022 03:00:13.916 * Loading RDB produced by version 7.0.3
1:M 16 Jul 2022 03:00:13.916 * RDB age 87889 seconds
1:M 16 Jul 2022 03:00:13.916 * RDB memory usage when created 0.82 Mb
1:M 16 Jul 2022 03:00:13.916 * RDB is base AOF
1:M 16 Jul 2022 03:00:13.916 * Done loading RDB, keys loaded: 0, keys expired: 0.
1:M 16 Jul 2022 03:00:13.916 * DB loaded from base file appendonly.aof.1.base.rdb: 0.001 seconds
1:M 16 Jul 2022 03:00:13.916 * DB loaded from append only file: 0.001 seconds
1:M 16 Jul 2022 03:00:13.916 * Opening AOF incr file appendonly.aof.1.incr.aof on server start
1:M 16 Jul 2022 03:00:13.916 * Ready to accept connections
1:M 16 Jul 2022 03:01:00.773 * Replica redis-node-1.redis-headless.db.svc.cluster.local:6379 asks for synchronization
1:M 16 Jul 2022 03:01:00.773 * Full resync requested by replica redis-node-1.redis-headless.db.svc.cluster.local:6379
1:M 16 Jul 2022 03:01:00.773 * Replication backlog created, my new replication IDs are '26c60dedd154473cc80e75ee6a3dd125bc28374f' and '0000000000000000000000000000000000000000'
1:M 16 Jul 2022 03:01:00.773 * Delay next BGSAVE for diskless SYNC
1:M 16 Jul 2022 03:01:05.006 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 16 Jul 2022 03:01:05.006 * Background RDB transfer started by pid 243
243:C 16 Jul 2022 03:01:05.008 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 16 Jul 2022 03:01:05.008 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
1:M 16 Jul 2022 03:01:05.107 * Background RDB transfer terminated with success
1:M 16 Jul 2022 03:01:05.107 * Streamed RDB transfer with replica redis-node-1.redis-headless.db.svc.cluster.local:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 16 Jul 2022 03:01:05.107 * Synchronization with replica redis-node-1.redis-headless.db.svc.cluster.local:6379 succeeded

nobiit avatar Jul 16 '22 03:07 nobiit

A workaround until this is fixed is to set sentinel.getMasterTimeout to a lower value such as 60. The pod will hang for that value before starting. The default is 220 which results in the pod restarting due to failing readiness checks. I suppose alternatively one could set the check delay to 220, but that's a long time to wait for the pod to start.

bufke avatar Jul 25 '22 20:07 bufke

This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.

bitnami-bot avatar Aug 24 '22 01:08 bitnami-bot

This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.

bitnami-bot avatar Sep 10 '22 01:09 bitnami-bot

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Pull Request. Do not hesitate to reopen it later if necessary.

bitnami-bot avatar Sep 15 '22 01:09 bitnami-bot