charts
charts copied to clipboard
[bitnami/redis] Fix redis sentinel timeout
Description of the change
Fixed redis sentinel waiting until timeout and exiting
Benefits
Some CNI will keep connection to service until 1 pods are available
Possible drawbacks
None
Applicable issues
On some CNI (Cilium) it will keep the connection to the Service until timeout, it makes redis sentinel boot longer than usual and is killed by kubernetes
Additional information
On 1 test machine (Created with kind)
$ kubectl logs redis-node-0 sentinel
02:43:22.99 INFO ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD redis-cli -h redis.default.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
Could not connect to Redis at redis.default.svc.cluster.local:26379: Connection refused
1:X 15 Jul 2022 02:43:25.074 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 15 Jul 2022 02:43:25.074 # Redis version=7.0.2, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 15 Jul 2022 02:43:25.074 # Configuration loaded
1:X 15 Jul 2022 02:43:25.075 * monotonic clock: POSIX clock_gettime
1:X 15 Jul 2022 02:43:25.076 * Running mode=sentinel, port=26379.
1:X 15 Jul 2022 02:43:25.076 # Sentinel ID is 2a09ba7abbb41ee71e79087310d75f9809c3c815
1:X 15 Jul 2022 02:43:25.076 # +monitor master mymaster redis-node-0.redis-headless.default.svc.cluster.local 6379 quorum 2
On my machine (using CNI Cilium)
$ kubectl logs -f redis-node-0 sentinel
08:08:14.43 INFO ==> Found previous master redis-node-0.redis-headless.db.svc.cluster.local in /opt/bitnami/redis-sentinel/etc/sentinel.conf
08:08:14.43 INFO ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD redis-cli -h redis.db.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
$ kubectl describe -n db pods redis-node-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned db/redis-node-0 to k8s-01
Normal Pulled 11m kubelet Container image "[docker.io/bitnami/bitnami-shell:11-debian-11-r14]" already present on machine
Normal Created 11m kubelet Created container volume-permissions
Normal Started 11m kubelet Started container volume-permissions
Normal Pulled 11m kubelet Container image "[docker.io/bitnami/redis:7.0.3-debian-11-r0]" already present on machine
Normal Created 11m kubelet Created container redis
Normal Started 11m kubelet Started container redis
Normal Pulled 11m kubelet Container image "[docker.io/bitnami/redis-sentinel:7.0.2-debian-11-r9]" already present on machine
Normal Created 11m kubelet Created container sentinel
Normal Started 11m kubelet Started container sentinel
Warning Unhealthy 10m (x8 over 11m) kubelet Startup probe failed: dial tcp 10.192.0.122:6379: connect: connection refused
Warning Unhealthy 112s (x52 over 11m) kubelet Startup probe failed: dial tcp 10.192.0.122:26379: connect: connection refused
Checklist
- [x] Chart version bumped in
Chart.yaml
according to semver. This is not necessary when the changes only affect README.md files. - [x] Variables are documented in the values.yaml and added to the
README.md
using readme-generator-for-helm - [x] Title of the pull request follows this pattern [bitnami/<name_of_the_chart>] Descriptive title
- [x] All commits signed off and in agreement of Developer Certificate of Origin (DCO)
Please accept this PR if this PR is accepted https://github.com/bitnami/bitnami-docker-redis-sentinel/pull/51
Thanks @nobiit, I'll pass this PR to one of our engineers to follow up, but there is something it looks weird to me, we are not providing netcat
in our images. How did you test it?
Hey @fmulero , I created 1 PR to docker redis to add netcat (https://github.com/bitnami/bitnami-docker-redis-sentinel/pull/51). I can't find any other way in the current image
Please don't do that, netcat
is the swiss army knife to open undesired connections. Please have a look to libnet.sh or use something like:
$ timeout 1 bash -c 'cat < /dev/null > /dev/tcp/google.com/80'
Thanks @fmulero for the suggestion, I've been looking for such a command and can't find it. I will update the PR again tomorrow
Hey @fmulero, I updated the PR and tested again in my environment. It works pretty well, thank you Please review the PR, or wait for someone else to have the same problem if you have any doubts
$ kubectl logs -f -n db redis-node-0 sentinel
02:56:34.37 INFO ==> Found previous master redis-node-0.redis-headless.db.svc.cluster.local in /opt/bitnami/redis-sentinel/etc/sentinel.conf
02:56:35.38 WARN ==> Timeout when connecting to redis.db.svc.cluster.local, will use redis-headless.db.svc.cluster.local
02:56:35.38 INFO ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD redis-cli -h redis-headless.db.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
Could not connect to Redis at redis-headless.db.svc.cluster.local:26379: Connection refused
1:X 16 Jul 2022 02:56:35.480 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 16 Jul 2022 02:56:35.480 # Redis version=7.0.2, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 16 Jul 2022 02:56:35.480 # Configuration loaded
1:X 16 Jul 2022 02:56:35.481 * monotonic clock: POSIX clock_gettime
1:X 16 Jul 2022 02:56:35.481 * Running mode=sentinel, port=26379.
1:X 16 Jul 2022 02:56:35.482 # Sentinel ID is 2a09ba7abbb41ee71e79087310d75f9809c3c815
1:X 16 Jul 2022 02:56:35.482 # +monitor master mymaster redis-node-0.redis-headless.db.svc.cluster.local 6379 quorum 2
$ kubectl logs -f -n db redis-node-1 sentinel
03:01:01.17 INFO ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD redis-cli -h redis.db.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
redis-node-0.redis-headless.db.svc.cluster.local
6379
03:01:01.18 INFO ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD redis-cli -h redis.db.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
03:01:01.18 INFO ==> printing REDIS_SENTINEL_INFO=(redis-node-0.redis-headless.db.svc.cluster.local,6379)
1:X 16 Jul 2022 03:01:01.270 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 16 Jul 2022 03:01:01.270 # Redis version=7.0.2, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 16 Jul 2022 03:01:01.270 # Configuration loaded
1:X 16 Jul 2022 03:01:01.270 * monotonic clock: POSIX clock_gettime
1:X 16 Jul 2022 03:01:01.271 * Running mode=sentinel, port=26379.
1:X 16 Jul 2022 03:01:01.272 # Sentinel ID is 33535e4e17bf8f9f9ff9ce8f9ddf609e558ff4f2
1:X 16 Jul 2022 03:01:01.272 # +monitor master mymaster redis-node-0.redis-headless.db.svc.cluster.local 6379 quorum 2
1:X 16 Jul 2022 03:01:01.274 * +slave slave redis-node-1.redis-headless.db.svc.cluster.local:6379 redis-node-1.redis-headless.db.svc.cluster.local 6379 @ mymaster redis-node-0.redis-headless.db.svc.cluster.local 6379
1:X 16 Jul 2022 03:01:01.367 * Sentinel new configuration saved on disk
$ kubectl logs -f -n db redis-node-0 redis
02:56:33.89 INFO ==> about to run the command: REDISCLI_AUTH=$REDIS_PASSWORD timeout 220 redis-cli -h redis.db.svc.cluster.local -p 26379 sentinel get-master-addr-by-name mymaster
03:00:13.89 INFO ==> Found previous master redis-node-0.redis-headless.db.svc.cluster.local:6379 in /opt/bitnami/redis-sentinel/etc/sentinel.conf
03:00:13.90 INFO ==> Configuring the node as master
1:C 16 Jul 2022 03:00:13.910 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 16 Jul 2022 03:00:13.910 # Redis version=7.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 16 Jul 2022 03:00:13.910 # Configuration loaded
1:M 16 Jul 2022 03:00:13.910 * monotonic clock: POSIX clock_gettime
1:M 16 Jul 2022 03:00:13.911 * Running mode=standalone, port=6379.
1:M 16 Jul 2022 03:00:13.911 # Server initialized
1:M 16 Jul 2022 03:00:13.916 * Reading RDB base file on AOF loading...
1:M 16 Jul 2022 03:00:13.916 * Loading RDB produced by version 7.0.3
1:M 16 Jul 2022 03:00:13.916 * RDB age 87889 seconds
1:M 16 Jul 2022 03:00:13.916 * RDB memory usage when created 0.82 Mb
1:M 16 Jul 2022 03:00:13.916 * RDB is base AOF
1:M 16 Jul 2022 03:00:13.916 * Done loading RDB, keys loaded: 0, keys expired: 0.
1:M 16 Jul 2022 03:00:13.916 * DB loaded from base file appendonly.aof.1.base.rdb: 0.001 seconds
1:M 16 Jul 2022 03:00:13.916 * DB loaded from append only file: 0.001 seconds
1:M 16 Jul 2022 03:00:13.916 * Opening AOF incr file appendonly.aof.1.incr.aof on server start
1:M 16 Jul 2022 03:00:13.916 * Ready to accept connections
1:M 16 Jul 2022 03:01:00.773 * Replica redis-node-1.redis-headless.db.svc.cluster.local:6379 asks for synchronization
1:M 16 Jul 2022 03:01:00.773 * Full resync requested by replica redis-node-1.redis-headless.db.svc.cluster.local:6379
1:M 16 Jul 2022 03:01:00.773 * Replication backlog created, my new replication IDs are '26c60dedd154473cc80e75ee6a3dd125bc28374f' and '0000000000000000000000000000000000000000'
1:M 16 Jul 2022 03:01:00.773 * Delay next BGSAVE for diskless SYNC
1:M 16 Jul 2022 03:01:05.006 * Starting BGSAVE for SYNC with target: replicas sockets
1:M 16 Jul 2022 03:01:05.006 * Background RDB transfer started by pid 243
243:C 16 Jul 2022 03:01:05.008 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 16 Jul 2022 03:01:05.008 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
1:M 16 Jul 2022 03:01:05.107 * Background RDB transfer terminated with success
1:M 16 Jul 2022 03:01:05.107 * Streamed RDB transfer with replica redis-node-1.redis-headless.db.svc.cluster.local:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
1:M 16 Jul 2022 03:01:05.107 * Synchronization with replica redis-node-1.redis-headless.db.svc.cluster.local:6379 succeeded
A workaround until this is fixed is to set sentinel.getMasterTimeout to a lower value such as 60. The pod will hang for that value before starting. The default is 220 which results in the pod restarting due to failing readiness checks. I suppose alternatively one could set the check delay to 220, but that's a long time to wait for the pod to start.
This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.
This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Pull Request. Do not hesitate to reopen it later if necessary.