redis-operator
redis-operator copied to clipboard
Improving Reliability of statefulset RollingUpdate with Container Lifecycle Hooks
Is your feature request related to a problem? Please describe.
In scenarios like a Redis version upgrade that alter the desired status of a statefulset, the statefulset's updateStrategy causes Pods to undergo a RollingUpdate
.
Assuming we have a 3-member replication setup, there is a risk of data loss if a pod goes down momentarily without securing a replica, due to a lack of reconcile by the operator during the RollingUpdate
.
Therefore, during the rollingUpdate process facilitated by the statefulset, it is crucial to ensure that at least one replica, synchronized with the leader, is secured.
While it is possible to think setting the statefulset's terminationGracePeriodSeconds
to a sufficiently long duration to delay the rollingUpdate
might be adequate,
I believe using Container Lifecycle Hooks to functionally guarantee this would significantly enhance the project’s reliability.
Describe the solution you'd like
Describe alternatives you've considered
I propose writing event code for the PreStop
hook to check whether a failover-capable replica is secured before terminating the container:
If the pod designated for deletion has a redis-role
of slave
, then it is safe to delete the pod.
If it’s a master
, wait until a currently synced replica is secured.
If already secured, proceed.
If syncing is ongoing, remain in the loop until complete.
masterSyncInProgress
== 0
127.0.0.1:6379> INFO REPLICATION
# Replication
role:slave
master_host: xxx.xxx.xxx.xxx
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
...
I would like to hear what the maintainers think about this issue and the development of this feature.
If it's difficult for you to allocate time, I would like to add this feature myself and submit a Pull Request.
What version of redis-operator are you using?
redis-operator version:
Additional context
Here's the pseudo-code of the PreStop
event code.
### Pseudo-Code
infoReplication := redis-cli INFO REPLICATION
role := infoReplication[role]
masterSyncInProgress := infoReplication[master_sync_in_progress]
connectedSlaved := infoReplication[connected_slaves]
masterLinkStartup := infoReplication[master_link_startup]
if role == "master":
while !(connected_slaves > 0 && masterSyncInProgress == 0):
sleep(1)
else:
exit(0)
else if role == "slave":
while !(masterLinkStartup == "up"):
sleep(1)
else:
exit(0)
```
@wkd-woo Hi any update regarding this enhancement ?
@wkd-woo Hi any update regarding this enhancement ?
@sapisuper No, the maintainers don't give any feedback yet on this enhancement.