charts icon indicating copy to clipboard operation
charts copied to clipboard

redis-ha-4.27.0 - split brain

Open Pride1st1 opened this issue 1 year ago • 14 comments
trafficstars

Describe the bug I deployed the chart with default values. During its explatation we met condition when redis-0 and redis-2 are replicas of redis-1, and redis-1 is replica of redis-0. The split-brain-fix container wasn`t able to fix the problem.

172.20.75.109 - redis-0 172.20.181.236 - redis-1 172.20.198.17 - redis-2

redis-0:

  |   | 2024-06-18 18:23:36.849 | 1:S 18 Jun 2024 15:23:36.849 * Connecting to MASTER 172.20.181.236:6379 |  
  |   | 2024-06-18 18:23:36.849 | 1:S 18 Jun 2024 15:23:36.849 * MASTER <-> REPLICA sync started |  
  |   | 2024-06-18 18:23:36.850 | 1:S 18 Jun 2024 15:23:36.850 # Error condition on socket for SYNC: Connection refused |  
  |   | 2024-06-18 18:23:37.852 | 1:S 18 Jun 2024 15:23:37.852 * Connecting to MASTER 172.20.181.236:6379 |  
  |   | 2024-06-18 18:23:37.852 | 1:S 18 Jun 2024 15:23:37.852 * MASTER <-> REPLICA sync started

redis-1 (sentinel tries to restart it):

  |   | 2024-06-18 18:26:55.109 | 1:S 18 Jun 2024 15:26:55.109 * Ready to accept connections tcp |  
  |   | 2024-06-18 18:26:55.109 | 1:S 18 Jun 2024 15:26:55.109 * Connecting to MASTER 172.20.75.109:6379 |  
  |   | 2024-06-18 18:26:55.110 | 1:S 18 Jun 2024 15:26:55.109 * MASTER <-> REPLICA sync started |  
  |   | 2024-06-18 18:26:55.110 | 1:S 18 Jun 2024 15:26:55.110 * Non blocking connect for SYNC fired the event. |  
  |   | 2024-06-18 18:26:55.111 | 1:S 18 Jun 2024 15:26:55.111 * Master replied to PING, replication can continue... |  
  |   | 2024-06-18 18:26:55.113 | 1:S 18 Jun 2024 15:26:55.112 * Trying a partial resynchronization (request 8605e4e1a74e2a74a8ad3742efb5784ad4b0ce41:1). |  
  |   | 2024-06-18 18:26:55.113 | 1:S 18 Jun 2024 15:26:55.113 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master |  
  |   | 2024-06-18 18:26:56.114 | 1:S 18 Jun 2024 15:26:56.113 * Connecting to MASTER 172.20.75.109:6379 |  
  |   | 2024-06-18 18:26:56.114 | 1:S 18 Jun 2024 15:26:56.114 * MASTER <-> REPLICA sync started

sentinel-1 (leader)

  |   | 2024-06-18 18:26:55.883 | 1:X 18 Jun 2024 15:26:55.883 * +reboot master mymaster 172.20.181.236 6379 |  
  |   | 2024-06-18 18:28:09.960 | 1:X 18 Jun 2024 15:28:09.960 # +new-epoch 21 |  
  |   | 2024-06-18 18:28:09.960 | 1:X 18 Jun 2024 15:28:09.960 # +try-failover master mymaster 172.20.181.236 6379 |  
  |   | 2024-06-18 18:28:09.963 | 1:X 18 Jun 2024 15:28:09.963 * Sentinel new configuration saved on disk |  
  |   | 2024-06-18 18:28:09.963 | 1:X 18 Jun 2024 15:28:09.963 # +vote-for-leader aa33680947f52ae19df761ea8f26a4285d4910c1 21 |  
  |   | 2024-06-18 18:28:09.969 | 1:X 18 Jun 2024 15:28:09.969 * d4ca60ac0fa2353d3c6a5684df1401f8faccf6ef voted for aa33680947f52ae19df761ea8f26a4285d4910c1 21 |  
  |   | 2024-06-18 18:28:09.969 | 1:X 18 Jun 2024 15:28:09.969 * d21ee95d5d45a94a9deb59bd2b2797a4bddedf53 voted for aa33680947f52ae19df761ea8f26a4285d4910c1 21 |  
  |   | 2024-06-18 18:28:10.039 | 1:X 18 Jun 2024 15:28:10.039 # +elected-leader master mymaster 172.20.181.236 6379 |  
  |   | 2024-06-18 18:28:10.039 | 1:X 18 Jun 2024 15:28:10.039 # +failover-state-select-slave master mymaster 172.20.181.236 6379 |  
  |   | 2024-06-18 18:28:10.116 | 1:X 18 Jun 2024 15:28:10.115 # -failover-abort-no-good-slave master mymaster 172.20.181.236 6379 |   |   | 2024-06-18 18:28:10.187 | 1:X 18 Jun 2024 15:28:10.187 * Next failover delay: I will not start a failover before Tue Jun 18 15:34:10 2024 |  
  |   | 2024-06-18 18:32:53.938 | 1:X 18 Jun 2024 15:32:53.936 * +reboot master mymaster 172.20.181.236 6379

split-brain-fix-1

  |   | 2024-06-18 18:20:30.025 | Could not connect to Redis at 127.0.0.1:6379: Connection refused |  
  |   | 2024-06-18 18:20:30.025 | Could not connect to Redis at 127.0.0.1:6379: Connection refused |  
  |   | 2024-06-18 18:21:30.027 | Identifying redis master (get-master-addr-by-name).. |  
  |   | 2024-06-18 18:21:30.027 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |  
  |   | 2024-06-18 18:21:30.043 | Tue Jun 18 15:21:30 UTC 2024 Found redis master (172.20.181.236) |  
  |   | 2024-06-18 18:21:30.046 | Could not connect to Redis at 127.0.0.1:6379: Connection refused |  
  |   | 2024-06-18 18:21:30.049 | Tue Jun 18 15:21:30 UTC 2024 Start... |  
  |   | 2024-06-18 18:21:30.057 | Initializing config.. |  
  |   | 2024-06-18 18:21:30.057 | Copying default redis config.. |  
  |   | 2024-06-18 18:21:30.057 | to '/data/conf/redis.conf' |  
  |   | 2024-06-18 18:21:30.061 | Copying default sentinel config.. |  
  |   | 2024-06-18 18:21:30.061 | to '/data/conf/sentinel.conf' |  
  |   | 2024-06-18 18:21:30.063 | Identifying redis master (get-master-addr-by-name).. |  
  |   | 2024-06-18 18:21:30.063 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |  
  |   | 2024-06-18 18:21:30.083 | Tue Jun 18 15:21:30 UTC 2024 Found redis master (172.20.181.236) |  
  |   | 2024-06-18 18:21:30.083 | Identify announce ip for this pod.. |  
  |   | 2024-06-18 18:21:30.083 | using (hewi-redis-ha-announce-1) or (hewi-redis-ha-server-1) |  
  |   | 2024-06-18 18:21:30.088 | identified announce (172.20.181.236) |  
  |   | 2024-06-18 18:21:30.088 | Verifying redis master.. |  
  |   | 2024-06-18 18:21:30.088 | ping (172.20.181.236:6379) |  
  |   | 2024-06-18 18:21:30.091 | Could not connect to Redis at 172.20.181.236:6379: Connection refused |  
  |   | 2024-06-18 18:21:34.102 | Could not connect to Redis at 172.20.181.236:6379: Connection refused |  
  |   | 2024-06-18 18:21:39.125 | Could not connect to Redis at 172.20.181.236:6379: Connection refused |  
  |   | 2024-06-18 18:21:45.137 | Tue Jun 18 15:21:45 UTC 2024 Can't ping redis master (172.20.181.236) |  
  |   | 2024-06-18 18:21:45.137 | Attempting to force failover (sentinel failover).. |  
  |   | 2024-06-18 18:21:45.137 | on sentinel (hewi-redis-ha:26379), sentinel grp (mymaster) |  
  |   | 2024-06-18 18:21:45.144 | Tue Jun 18 15:21:45 UTC 2024 Failover returned with 'NOGOODSLAVE' |  
  |   | 2024-06-18 18:21:45.144 | Setting defaults for this pod.. |  
  |   | 2024-06-18 18:21:45.144 | Setting up defaults.. |  
  |   | 2024-06-18 18:21:45.144 | using statefulset index (1) |  
  |   | 2024-06-18 18:21:45.144 | Getting redis master ip.. |  
  |   | 2024-06-18 18:21:45.144 | blindly assuming (hewi-redis-ha-announce-0) or (hewi-redis-ha-server-0) are master |  
  |   | 2024-06-18 18:21:45.161 | identified redis (may be redis master) ip (172.20.75.109) |  
  |   | 2024-06-18 18:21:45.161 | Setting default slave config for redis and sentinel.. |  
  |   | 2024-06-18 18:21:45.161 | using master ip (172.20.75.109) |  
  |   | 2024-06-18 18:21:45.161 | Updating redis config.. |  
  |   | 2024-06-18 18:21:45.162 | we are slave of redis master (172.20.75.109:6379) |  
  |   | 2024-06-18 18:21:45.162 | Updating sentinel config.. |  
  |   | 2024-06-18 18:21:45.162 | evaluating sentinel id (${SENTINEL_ID_1}) |  
  |   | 2024-06-18 18:21:45.162 | sentinel id (aa33680947f52ae19df761ea8f26a4285d4910c1), sentinel grp (mymaster), quorum (2) |  
  |   | 2024-06-18 18:21:45.163 | redis master (172.20.75.109:6379) |  
  |   | 2024-06-18 18:21:45.164 | announce (172.20.181.236:26379) |  
  |   | 2024-06-18 18:21:45.165 | Tue Jun 18 15:21:45 UTC 2024 Ready...

split-brain-fix-0

  |   | 2024-06-18 18:21:56.044 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |  
  |   | 2024-06-18 18:21:56.052 | Tue Jun 18 15:21:56 UTC 2024 Found redis master (172.20.181.236) |  
  |   | 2024-06-18 18:22:56.056 | Identifying redis master (get-master-addr-by-name).. |  
  |   | 2024-06-18 18:22:56.056 | using sentinel (hewi-redis-ha), sentinel group name (mymaster) |  
  |   | 2024-06-18 18:22:56.063 | Tue Jun 18 15:22:56 UTC 2024 Found redis master (172.20.181.236) |  
  |   | 2024-06-18 18:23:56.067 | Identifying redis master (get-master-addr-by-name)..

To Reproduce I tried node/pod deletion and redis-cli replicaof with no success to reproduce this bug

Expected behavior split-brain-fix container should fix even this rare case

Additional context The scripts logic was broken by inability of sentinel to failover. Maybe script should have additional condition to check the role of potential default master. I will be very apreatiate for any help with this. Please let me know if you need some additional logs/checks

Pride1st1 avatar Jun 26 '24 11:06 Pride1st1

+1

mhkarimi1383 avatar Aug 18 '24 07:08 mhkarimi1383

I've had this too.

I've found that when we've added a descheduler to the stack (https://github.com/kubernetes-sigs/descheduler) to balance nodes automatically, this kind of issue will disable the redis service frequently.

Can the master allocation be done with kubernetes lease locks? https://kubernetes.io/docs/concepts/architecture/leases/

tschirmer avatar Oct 24 '24 01:10 tschirmer

@tschirmer I'm trying to work out why this would happen unless the podManagementPolicy of the STS is set to Parallel?

Is this happening in either of your cases? @tschirmer ??

Because in theory, on first rollout, the first pod should start up and become master, way before -1/-2 start.

DandyDeveloper avatar Oct 24 '24 16:10 DandyDeveloper

@DandyDeveloper Hi I'm having problem when my network becomes a bit unstable (for example pods are not able to each other for a sec.) and my redis pods can't see each other

mhkarimi1383 avatar Oct 24 '24 16:10 mhkarimi1383

@tschirmer I'm trying to work out why this would happen unless the podManagementPolicy of the STS is set to Parallel?

Is this happening in either of your cases? @tschirmer ??

Because in theory, on first rollout, the first pod should start up and become master, way before -1/-2 start.

Haven't set it to Parallel. I suspect it would be something like, pod when evicted isn't completing the trigger-failover-if-master.sh. We are running it with sentinel, which might add some complexity here. I haven't debugged it yet.

So far we're getting a load of issues with the liveness probe not containing the SENTINELAUTH env from the secret, but it's clearly defined in the spec; and a restart of the pod works. It's happening very frequently though, so I'm wondering if there needs to be a grace period defined on startup and shutdown to prevent it both of these things from happening

tschirmer avatar Oct 30 '24 11:10 tschirmer

I think being able to have separated Statefulsets for redises and sentinels will make this chart more stable and manageable, By creating two Statefulsets and giving sentinel monitor config to monitor an external host

mhkarimi1383 avatar Nov 03 '24 09:11 mhkarimi1383

I like the idea of seperate stateful sets, I've been thinking of doing that and making a PR

I suspect this is from preStop hooks not firing and completely successfully. trigger-failover-if-master.sh occasionally doesn't run as expected. When we had the descheduler running it was ~2min between turning on and off each pod, and found that every now and again, that would fail. The rate of failure is low, so it's unlikely occur unless you're hammering it (we haven't had an issue with the ah cluster once we turned off the descheduler.

tschirmer avatar Nov 07 '24 23:11 tschirmer

I wanted to make a PR too. But there are a lot of configs that should propagate this change

mhkarimi1383 avatar Nov 08 '24 08:11 mhkarimi1383

I found that there were a couple things wrong with my setup:

    1. because the permissions for the readonly config was set to 420, no preStop hooks were being triggered.
    1. we had a problem with our pvc csi driver that was attaching the same drive to any pvc with the same name in different namespaces.

The permissions were the killer, because nothing was failing over on shutdown.

I'm half way through writing a leader elector in golang for this based on k8s leases. Got it claiming the lease already. I'm not sure it's totally necessary after we've solved these other issues though.

tschirmer avatar Nov 22 '24 04:11 tschirmer

specifically. In the stateful set the volume definitions here: from:

      volumes:
        - configMap:
            defaultMode: 420  ####THIS ONE  ensured that the preStop Hooks didn't have the permissions to run. Changed it to 430
            name: redis-session-configmap
          name: config
        - hostPath:
            path: /sys
            type: ''
          name: host-sys
        - configMap:
            defaultMode: 493
            name: redis-session-health-configmap
          name: health

to:

      volumes:
        - configMap:
            defaultMode: 430  ####THIS ONE  ensured that the preStop Hooks didn't have the permissions to run. Changed it to 430
            name: redis-session-configmap
          name: config
        - hostPath:
            path: /sys
            type: ''
          name: host-sys
        - configMap:
            defaultMode: 493
            name: redis-session-health-configmap
          name: health

tschirmer avatar Nov 22 '24 04:11 tschirmer

Also found that that preStopHook: /readonly-config/..data/trigger-failover-if-master.sh

requires SENTINELAUTH, but it's not defined in the env for the redis container

echo "[K8S PreStop Hook] Start Failover."
get_redis_role() {
  is_master=$(
    redis-cli \
      -a "${AUTH}" --no-auth-warning \
      -h localhost \
      -p 6379 \
      info | grep -c 'role:master' || true
  )
}
get_redis_role

echo "[K8S PreStop Hook] Got redis role."
if [[ "$is_master" -eq 1 ]]; then
  echo "[K8S PreStop Hook] This node is currently master, we trigger a failover."
  response=$(
    redis-cli \
      -a "${SENTINELAUTH}" --no-auth-warning \
      -h 127.0.0.1 \
      -p 26379 \
      SENTINEL failover mymaster
  )
  if [[ "$response" != "OK" ]] ; then
    echo "[K8S PreStop Hook] Failover failed"
    echo "$response"
    exit 1
  fi
  timeout=30
  while [[ "$is_master" -eq 1 && $timeout -gt 0 ]]; do
    sleep 1
    get_redis_role
    timeout=$((timeout - 1))
  done
  echo "[K8S PreStop Hook] Failover successful"
else
  echo "[K8S PreStop Hook] This node is currently replica, no failover needed."
fi

tschirmer avatar Nov 22 '24 04:11 tschirmer

^I'd modified the above so I could get some debug data. Along with this in the stateful set:

          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - >-
                    echo "running preStop" >> /proc/1/fd/1 &&
                    /readonly-config/trigger-failover-if-master.sh | tee >>
                    /proc/1/fd/1 &&  echo "finished preStop" >> /proc/1/fd/1

the >> /proc/1/fd/1 forces this output in the container log in k8s

tschirmer avatar Nov 22 '24 04:11 tschirmer

Found that running preStops would consistently fail.

running preStop
[K8S PreStop Hook] Start Failover.
[K8S PreStop Hook] Got redis role.
[K8S PreStop Hook] This node is currently master, we trigger a failover.
[K8S PreStop Hook] Failover failed

finished preStop

Found that the Sentinel container had shut down before the command could be executed on the localhost., so it kept getting a failover failed. Changed the sentinel preStop to add in a 10 sec delay to keep it alive while this happened and it seems to work every time now.

          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - >-
                    sleep 10

tschirmer avatar Nov 22 '24 05:11 tschirmer

Found that running preStops would consistently fail.

running preStop
[K8S PreStop Hook] Start Failover.
[K8S PreStop Hook] Got redis role.
[K8S PreStop Hook] This node is currently master, we trigger a failover.
[K8S PreStop Hook] Failover failed

finished preStop

Found that the Sentinel container had shut down before the command could be executed on the localhost., so it kept getting a failover failed. Changed the sentinel preStop to add in a 10 sec delay to keep it alive while this happened and it seems to work every time now.

          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - '-c'
                  - >-
                    sleep 10

While this "might" work, it "may not" be consistent, suggest taking a look at my solution instead here: https://github.com/DandyDeveloper/charts/issues/207#issuecomment-1827134022

cm3brian avatar Nov 25 '24 00:11 cm3brian

hello all, what if we just simply recheck the splitbrain ? It looks the script gets wrong master if you restart the master 1 sec before splitbrain check starts.

I added the following changes to my fix-split-brain.sh

    while true; do
        sleep 60
        identify_master
        if [ "$MASTER" = "$ANNOUNCE_IP" ]; then
            redis_role
            if [ "$ROLE" != "master" ]; then
                echo "waiting for redis to become master"
                sleep 15
                identify_master
                redis_role
                echo "Redis role is $ROLE, expected role is master. No need to reinitialize."
                if [ "$ROLE" != "master" ]; then
                    echo "Redis role is $ROLE, expected role is master, reinitializing"
                    reinit
                fi
            fi
        elif [ "${MASTER}" ]; then
            identify_redis_master
            if [ "$REDIS_MASTER" != "$MASTER" ]; then
                echo "Redis master and local master are not the same. waiting."
                sleep 15
                identify_master
                identify_redis_master
                echo "Redis master is ${MASTER}, expected master is ${REDIS_MASTER}. No need to reinitialize."
                if [ "${REDIS_MASTER}" != "${MASTER}" ]; then
                    echo "Redis master is ${MASTER}, expected master is ${REDIS_MASTER}, reinitializing"
                    reinit
                fi
            fi
        fi
    done

SCLogo avatar Jul 29 '25 12:07 SCLogo

Checkout https://github.com/ParminCloud/Charts/tree/master/charts/redis-ha we have changed sentinel to be separated from redis-servers

mhkarimi1383 avatar Jul 29 '25 15:07 mhkarimi1383

I'd welcome PRs for these fixes.

Even spitting Sentinel out, if we make it a conditional in the chart, we can support it.

@mhkarimi1383 @SCLogo

DandyDeveloper avatar Jul 29 '25 15:07 DandyDeveloper

@mhkarimi1383 that is a good solution too, but we don't want to run sentinel in separate pods, that's why we did this change. @DandyDeveloper I am going to clean the code, template and will create a PR then. Thanks

SCLogo avatar Jul 29 '25 16:07 SCLogo

Closing as the retry mechanism MAY resolve this entirely.

DandyDeveloper avatar Sep 13 '25 00:09 DandyDeveloper