charts
charts copied to clipboard
redis-ha-4.27.0 - split brain
Describe the bug I deployed the chart with default values. During its explatation we met condition when redis-0 and redis-2 are replicas of redis-1, and redis-1 is replica of redis-0. The split-brain-fix container wasn`t able to fix the problem.
172.20.75.109 - redis-0 172.20.181.236 - redis-1 172.20.198.17 - redis-2
redis-0:
| | 2024-06-18 18:23:36.849 | 1:S 18 Jun 2024 15:23:36.849 * Connecting to MASTER 172.20.181.236:6379 |
| | 2024-06-18 18:23:36.849 | 1:S 18 Jun 2024 15:23:36.849 * MASTER <-> REPLICA sync started |
| | 2024-06-18 18:23:36.850 | 1:S 18 Jun 2024 15:23:36.850 # Error condition on socket for SYNC: Connection refused |
| | 2024-06-18 18:23:37.852 | 1:S 18 Jun 2024 15:23:37.852 * Connecting to MASTER 172.20.181.236:6379 |
| | 2024-06-18 18:23:37.852 | 1:S 18 Jun 2024 15:23:37.852 * MASTER <-> REPLICA sync started
redis-1 (sentinel tries to restart it):
| | 2024-06-18 18:26:55.109 | 1:S 18 Jun 2024 15:26:55.109 * Ready to accept connections tcp |
| | 2024-06-18 18:26:55.109 | 1:S 18 Jun 2024 15:26:55.109 * Connecting to MASTER 172.20.75.109:6379 |
| | 2024-06-18 18:26:55.110 | 1:S 18 Jun 2024 15:26:55.109 * MASTER <-> REPLICA sync started |
| | 2024-06-18 18:26:55.110 | 1:S 18 Jun 2024 15:26:55.110 * Non blocking connect for SYNC fired the event. |
| | 2024-06-18 18:26:55.111 | 1:S 18 Jun 2024 15:26:55.111 * Master replied to PING, replication can continue... |
| | 2024-06-18 18:26:55.113 | 1:S 18 Jun 2024 15:26:55.112 * Trying a partial resynchronization (request 8605e4e1a74e2a74a8ad3742efb5784ad4b0ce41:1). |
| | 2024-06-18 18:26:55.113 | 1:S 18 Jun 2024 15:26:55.113 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master |
| | 2024-06-18 18:26:56.114 | 1:S 18 Jun 2024 15:26:56.113 * Connecting to MASTER 172.20.75.109:6379 |
| | 2024-06-18 18:26:56.114 | 1:S 18 Jun 2024 15:26:56.114 * MASTER <-> REPLICA sync started
sentinel-1 (leader)
| | 2024-06-18 18:26:55.883 | 1:X 18 Jun 2024 15:26:55.883 * +reboot master mymaster 172.20.181.236 6379 |
| | 2024-06-18 18:28:09.960 | 1:X 18 Jun 2024 15:28:09.960 # +new-epoch 21 |
| | 2024-06-18 18:28:09.960 | 1:X 18 Jun 2024 15:28:09.960 # +try-failover master mymaster 172.20.181.236 6379 |
| | 2024-06-18 18:28:09.963 | 1:X 18 Jun 2024 15:28:09.963 * Sentinel new configuration saved on disk |
| | 2024-06-18 18:28:09.963 | 1:X 18 Jun 2024 15:28:09.963 # +vote-for-leader aa33680947f52ae19df761ea8f26a4285d4910c1 21 |
| | 2024-06-18 18:28:09.969 | 1:X 18 Jun 2024 15:28:09.969 * d4ca60ac0fa2353d3c6a5684df1401f8faccf6ef voted for aa33680947f52ae19df761ea8f26a4285d4910c1 21 |
| | 2024-06-18 18:28:09.969 | 1:X 18 Jun 2024 15:28:09.969 * d21ee95d5d45a94a9deb59bd2b2797a4bddedf53 voted for aa33680947f52ae19df761ea8f26a4285d4910c1 21 |
| | 2024-06-18 18:28:10.039 | 1:X 18 Jun 2024 15:28:10.039 # +elected-leader master mymaster 172.20.181.236 6379 |
| | 2024-06-18 18:28:10.039 | 1:X 18 Jun 2024 15:28:10.039 # +failover-state-select-slave master mymaster 172.20.181.236 6379 |
| | 2024-06-18 18:28:10.116 | 1:X 18 Jun 2024 15:28:10.115 # -failover-abort-no-good-slave master mymaster 172.20.181.236 6379 | | | 2024-06-18 18:28:10.187 | 1:X 18 Jun 2024 15:28:10.187 * Next failover delay: I will not start a failover before Tue Jun 18 15:34:10 2024 |
| | 2024-06-18 18:32:53.938 | 1:X 18 Jun 2024 15:32:53.936 * +reboot master mymaster 172.20.181.236 6379
split-brain-fix-1
| | 2024-06-18 18:20:30.025 | Could not connect to Redis at 127.0.0.1:6379: Connection refused |
| | 2024-06-18 18:20:30.025 | Could not connect to Redis at 127.0.0.1:6379: Connection refused |
| | 2024-06-18 18:21:30.027 | Identifying redis master (get-master-addr-by-name).. |
| | 2024-06-18 18:21:30.027 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |
| | 2024-06-18 18:21:30.043 | Tue Jun 18 15:21:30 UTC 2024 Found redis master (172.20.181.236) |
| | 2024-06-18 18:21:30.046 | Could not connect to Redis at 127.0.0.1:6379: Connection refused |
| | 2024-06-18 18:21:30.049 | Tue Jun 18 15:21:30 UTC 2024 Start... |
| | 2024-06-18 18:21:30.057 | Initializing config.. |
| | 2024-06-18 18:21:30.057 | Copying default redis config.. |
| | 2024-06-18 18:21:30.057 | to '/data/conf/redis.conf' |
| | 2024-06-18 18:21:30.061 | Copying default sentinel config.. |
| | 2024-06-18 18:21:30.061 | to '/data/conf/sentinel.conf' |
| | 2024-06-18 18:21:30.063 | Identifying redis master (get-master-addr-by-name).. |
| | 2024-06-18 18:21:30.063 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |
| | 2024-06-18 18:21:30.083 | Tue Jun 18 15:21:30 UTC 2024 Found redis master (172.20.181.236) |
| | 2024-06-18 18:21:30.083 | Identify announce ip for this pod.. |
| | 2024-06-18 18:21:30.083 | using (hewi-redis-ha-announce-1) or (hewi-redis-ha-server-1) |
| | 2024-06-18 18:21:30.088 | identified announce (172.20.181.236) |
| | 2024-06-18 18:21:30.088 | Verifying redis master.. |
| | 2024-06-18 18:21:30.088 | ping (172.20.181.236:6379) |
| | 2024-06-18 18:21:30.091 | Could not connect to Redis at 172.20.181.236:6379: Connection refused |
| | 2024-06-18 18:21:34.102 | Could not connect to Redis at 172.20.181.236:6379: Connection refused |
| | 2024-06-18 18:21:39.125 | Could not connect to Redis at 172.20.181.236:6379: Connection refused |
| | 2024-06-18 18:21:45.137 | Tue Jun 18 15:21:45 UTC 2024 Can't ping redis master (172.20.181.236) |
| | 2024-06-18 18:21:45.137 | Attempting to force failover (sentinel failover).. |
| | 2024-06-18 18:21:45.137 | on sentinel (hewi-redis-ha:26379), sentinel grp (mymaster) |
| | 2024-06-18 18:21:45.144 | Tue Jun 18 15:21:45 UTC 2024 Failover returned with 'NOGOODSLAVE' |
| | 2024-06-18 18:21:45.144 | Setting defaults for this pod.. |
| | 2024-06-18 18:21:45.144 | Setting up defaults.. |
| | 2024-06-18 18:21:45.144 | using statefulset index (1) |
| | 2024-06-18 18:21:45.144 | Getting redis master ip.. |
| | 2024-06-18 18:21:45.144 | blindly assuming (hewi-redis-ha-announce-0) or (hewi-redis-ha-server-0) are master |
| | 2024-06-18 18:21:45.161 | identified redis (may be redis master) ip (172.20.75.109) |
| | 2024-06-18 18:21:45.161 | Setting default slave config for redis and sentinel.. |
| | 2024-06-18 18:21:45.161 | using master ip (172.20.75.109) |
| | 2024-06-18 18:21:45.161 | Updating redis config.. |
| | 2024-06-18 18:21:45.162 | we are slave of redis master (172.20.75.109:6379) |
| | 2024-06-18 18:21:45.162 | Updating sentinel config.. |
| | 2024-06-18 18:21:45.162 | evaluating sentinel id (${SENTINEL_ID_1}) |
| | 2024-06-18 18:21:45.162 | sentinel id (aa33680947f52ae19df761ea8f26a4285d4910c1), sentinel grp (mymaster), quorum (2) |
| | 2024-06-18 18:21:45.163 | redis master (172.20.75.109:6379) |
| | 2024-06-18 18:21:45.164 | announce (172.20.181.236:26379) |
| | 2024-06-18 18:21:45.165 | Tue Jun 18 15:21:45 UTC 2024 Ready...
split-brain-fix-0
| | 2024-06-18 18:21:56.044 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |
| | 2024-06-18 18:21:56.052 | Tue Jun 18 15:21:56 UTC 2024 Found redis master (172.20.181.236) |
| | 2024-06-18 18:22:56.056 | Identifying redis master (get-master-addr-by-name).. |
| | 2024-06-18 18:22:56.056 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |
| | 2024-06-18 18:22:56.063 | Tue Jun 18 15:22:56 UTC 2024 Found redis master (172.20.181.236) |
| | 2024-06-18 18:23:56.067 | Identifying redis master (get-master-addr-by-name)..
To Reproduce I tried node/pod deletion and redis-cli replicaof with no success to reproduce this bug
Expected behavior split-brain-fix container should fix even this rare case
Additional context The scripts logic was broken by inability of sentinel to failover. Maybe script should have additional condition to check the role of potential default master. I will be very apreatiate for any help with this. Please let me know if you need some additional logs/checks
+1
I've had this too.
I've found that when we've added a descheduler to the stack (https://github.com/kubernetes-sigs/descheduler) to balance nodes automatically, this kind of issue will disable the redis service frequently.
Can the master allocation be done with kubernetes lease locks? https://kubernetes.io/docs/concepts/architecture/leases/
@tschirmer I'm trying to work out why this would happen unless the podManagementPolicy of the STS is set to Parallel?
Is this happening in either of your cases? @tschirmer ??
Because in theory, on first rollout, the first pod should start up and become master, way before -1/-2 start.
@DandyDeveloper Hi I'm having problem when my network becomes a bit unstable (for example pods are not able to each other for a sec.) and my redis pods can't see each other
@tschirmer I'm trying to work out why this would happen unless the
podManagementPolicyof the STS is set to Parallel?Is this happening in either of your cases? @tschirmer ??
Because in theory, on first rollout, the first pod should start up and become master, way before -1/-2 start.
Haven't set it to Parallel. I suspect it would be something like, pod when evicted isn't completing the trigger-failover-if-master.sh. We are running it with sentinel, which might add some complexity here. I haven't debugged it yet.
So far we're getting a load of issues with the liveness probe not containing the SENTINELAUTH env from the secret, but it's clearly defined in the spec; and a restart of the pod works. It's happening very frequently though, so I'm wondering if there needs to be a grace period defined on startup and shutdown to prevent it both of these things from happening
I think being able to have separated Statefulsets for redises and sentinels will make this chart more stable and manageable,
By creating two Statefulsets and giving sentinel monitor config to monitor an external host
I like the idea of seperate stateful sets, I've been thinking of doing that and making a PR
I suspect this is from preStop hooks not firing and completely successfully. trigger-failover-if-master.sh occasionally doesn't run as expected. When we had the descheduler running it was ~2min between turning on and off each pod, and found that every now and again, that would fail. The rate of failure is low, so it's unlikely occur unless you're hammering it (we haven't had an issue with the ah cluster once we turned off the descheduler.
I wanted to make a PR too. But there are a lot of configs that should propagate this change
I found that there were a couple things wrong with my setup:
-
- because the permissions for the readonly config was set to 420, no preStop hooks were being triggered.
-
- we had a problem with our pvc csi driver that was attaching the same drive to any pvc with the same name in different namespaces.
The permissions were the killer, because nothing was failing over on shutdown.
I'm half way through writing a leader elector in golang for this based on k8s leases. Got it claiming the lease already. I'm not sure it's totally necessary after we've solved these other issues though.
specifically. In the stateful set the volume definitions here: from:
volumes:
- configMap:
defaultMode: 420 ####THIS ONE ensured that the preStop Hooks didn't have the permissions to run. Changed it to 430
name: redis-session-configmap
name: config
- hostPath:
path: /sys
type: ''
name: host-sys
- configMap:
defaultMode: 493
name: redis-session-health-configmap
name: health
to:
volumes:
- configMap:
defaultMode: 430 ####THIS ONE ensured that the preStop Hooks didn't have the permissions to run. Changed it to 430
name: redis-session-configmap
name: config
- hostPath:
path: /sys
type: ''
name: host-sys
- configMap:
defaultMode: 493
name: redis-session-health-configmap
name: health
Also found that that preStopHook: /readonly-config/..data/trigger-failover-if-master.sh
requires SENTINELAUTH, but it's not defined in the env for the redis container
echo "[K8S PreStop Hook] Start Failover."
get_redis_role() {
is_master=$(
redis-cli \
-a "${AUTH}" --no-auth-warning \
-h localhost \
-p 6379 \
info | grep -c 'role:master' || true
)
}
get_redis_role
echo "[K8S PreStop Hook] Got redis role."
if [[ "$is_master" -eq 1 ]]; then
echo "[K8S PreStop Hook] This node is currently master, we trigger a failover."
response=$(
redis-cli \
-a "${SENTINELAUTH}" --no-auth-warning \
-h 127.0.0.1 \
-p 26379 \
SENTINEL failover mymaster
)
if [[ "$response" != "OK" ]] ; then
echo "[K8S PreStop Hook] Failover failed"
echo "$response"
exit 1
fi
timeout=30
while [[ "$is_master" -eq 1 && $timeout -gt 0 ]]; do
sleep 1
get_redis_role
timeout=$((timeout - 1))
done
echo "[K8S PreStop Hook] Failover successful"
else
echo "[K8S PreStop Hook] This node is currently replica, no failover needed."
fi
^I'd modified the above so I could get some debug data. Along with this in the stateful set:
lifecycle:
preStop:
exec:
command:
- /bin/sh
- '-c'
- >-
echo "running preStop" >> /proc/1/fd/1 &&
/readonly-config/trigger-failover-if-master.sh | tee >>
/proc/1/fd/1 && echo "finished preStop" >> /proc/1/fd/1
the >> /proc/1/fd/1 forces this output in the container log in k8s
Found that running preStops would consistently fail.
running preStop
[K8S PreStop Hook] Start Failover.
[K8S PreStop Hook] Got redis role.
[K8S PreStop Hook] This node is currently master, we trigger a failover.
[K8S PreStop Hook] Failover failed
finished preStop
Found that the Sentinel container had shut down before the command could be executed on the localhost., so it kept getting a failover failed. Changed the sentinel preStop to add in a 10 sec delay to keep it alive while this happened and it seems to work every time now.
lifecycle:
preStop:
exec:
command:
- /bin/sh
- '-c'
- >-
sleep 10
Found that running preStops would consistently fail.
running preStop [K8S PreStop Hook] Start Failover. [K8S PreStop Hook] Got redis role. [K8S PreStop Hook] This node is currently master, we trigger a failover. [K8S PreStop Hook] Failover failed finished preStopFound that the Sentinel container had shut down before the command could be executed on the localhost., so it kept getting a failover failed. Changed the sentinel preStop to add in a 10 sec delay to keep it alive while this happened and it seems to work every time now.
lifecycle: preStop: exec: command: - /bin/sh - '-c' - >- sleep 10
While this "might" work, it "may not" be consistent, suggest taking a look at my solution instead here: https://github.com/DandyDeveloper/charts/issues/207#issuecomment-1827134022
hello all, what if we just simply recheck the splitbrain ? It looks the script gets wrong master if you restart the master 1 sec before splitbrain check starts.
I added the following changes to my fix-split-brain.sh
while true; do
sleep 60
identify_master
if [ "$MASTER" = "$ANNOUNCE_IP" ]; then
redis_role
if [ "$ROLE" != "master" ]; then
echo "waiting for redis to become master"
sleep 15
identify_master
redis_role
echo "Redis role is $ROLE, expected role is master. No need to reinitialize."
if [ "$ROLE" != "master" ]; then
echo "Redis role is $ROLE, expected role is master, reinitializing"
reinit
fi
fi
elif [ "${MASTER}" ]; then
identify_redis_master
if [ "$REDIS_MASTER" != "$MASTER" ]; then
echo "Redis master and local master are not the same. waiting."
sleep 15
identify_master
identify_redis_master
echo "Redis master is ${MASTER}, expected master is ${REDIS_MASTER}. No need to reinitialize."
if [ "${REDIS_MASTER}" != "${MASTER}" ]; then
echo "Redis master is ${MASTER}, expected master is ${REDIS_MASTER}, reinitializing"
reinit
fi
fi
fi
done
Checkout https://github.com/ParminCloud/Charts/tree/master/charts/redis-ha we have changed sentinel to be separated from redis-servers
I'd welcome PRs for these fixes.
Even spitting Sentinel out, if we make it a conditional in the chart, we can support it.
@mhkarimi1383 @SCLogo
@mhkarimi1383 that is a good solution too, but we don't want to run sentinel in separate pods, that's why we did this change. @DandyDeveloper I am going to clean the code, template and will create a PR then. Thanks
Closing as the retry mechanism MAY resolve this entirely.