charts [stable/redis-ha][BUG] Pre-stop hook unsuccessful

Describe the bug Pre-stop hook fails with:

"PreStop hook failed" err="command '/bin/sh /readonly-config/trigger-failover-if-master.sh' exited with 1: Could not connect to Redis at localhost:26379: Connection refused\n"

Sometimes I have seen: error: command '/bin/sh /readonly-config/trigger-failover-if-master.sh' exited with 1: Warning: AUTH failed

To Reproduce Steps to reproduce the behavior:

Delete master pod

Expected behavior Failover triggered on master.

Additional context Have tried with chart 4.15.0 and 4.17.1 and redis 6.2.5 as well as redis 6.2.6. Seems it is the same behaviour. Also tried with and without exporter. Could it be due to that the sentinel container is terminated first? Auth and sentinel auth is configured.

Jul 12 '22 11:07 StefanKarlsson321

@StefanKarlsson321 This is with AUTH enabled presumably?

Jul 21 '22 07:07 DandyDeveloper

Yes with both redis and sentinel AUTH enabled. Hmm, now when I look at it I can see more detail on the AUTH failed message:

message="This node is currently master, we trigger a failover.\n\nAUTH failed: WRONGPASS invalid username-password pair or user is disabled.\n"

Jul 21 '22 07:07 StefanKarlsson321

I'll see if I can figure on why I sometimes get WRONGPASS invalid username-password pair or user is disabled. But the cluster is operating normally otherwise, with username and password, so it shouldn't occur from what I understand.

Oct 28 '22 08:10 StefanKarlsson321

I have this same issue and I determined that the problem is that while the redis container is the pod that has the preStop hook in it, this preStop hook references sentinel in its requests (namely that the master needs to be changed), sentinel has no such hook against it and as a result terminates instantly when SIGTERM is sent while redis keeps going trying to exec the hook (and it fails, as can be seen in the FailedPreStopHook event.

What is required is for sentinel to stay up longer than redis does (doesn't have to be by much), to allow it to complete the preStop hook.

I have patched my configs, in a resourceful way (i.e not very pretty, but functional), by adding this preStop hook to the sentinel container:

  preStop:
    exec:
      command:
      - /bin/sh
      - -c
      - until [ $(/health/redis_readiness.sh) != "response=PONG" ]; do sleep
        1; done

This change uses the already existing redis readiness check code to check if redis is still up, and does not close sentinel while this is the case.

This one was a big point of frustration for us as we have some 3rd party platforms with very poor failure handing in redis, so we would get "ghost" outages up to multiple times a week where there were random ~10 seconds outages, due to the sentinel config setting down-after-milliseconds being set to 10000 in the chart's config. Without sentinel the hook fails and master just disappears without explanation making sentinel wait the down-after-milliseconds delay before electing a new master organically.

Nov 27 '23 05:11 cm3brian