redis-sentinel
redis-sentinel copied to clipboard
Subscribe to switch master message is necessary
If there is a false alarm, and the sentinels promote a slave to master even the master is still alive, the client will keep connected to the old master, which becomes slave as a result of the switch.
@flyerhzm Why was the code that subscribe to the +switch-master event removed and can we bring it back?
@isaiah I forgot the reason, but you're right, I should add it back.
I just saw this with some testing. I added an action to my rack app, to test writability with a simple SET and GET. Then I spammed that action while a background process was sending "SENTINEL failover mymaster" to a sentinel every 20 seconds. Eventually, the SET operation fails.
It occurs to me that I could have the resilience I want without +switch-master event monitoring, if the redis-sentinel gem responded to failures as follows:
- Issue the INFO command.
- If it succeeds, and the output indicates that the instance is not master, rediscover current master and retry.
Rediscovery would probably enjoy a configurable delay.
What do you think?
Hi, I just had it happen with a resque-only redis in a production environment with down-after-milliseconds that seems to have been set a bit too low, as sentinel failed it over randomly after 3 days of running with no problems. This is a pretty serious issues - after the switch, all I could do was restart every process using redis in the cluster (which is most of them) and try to rescue missed jobs.
Is there any plan to track sentinel promotions again?
@isaiah @sheldonh @reist sorry for the late commit, finally I added "subscribe +switch-master" back again, please let me know if it works for you.