hedis
hedis copied to clipboard
The driver does not deal with changing redis IP addresses in cluster mode
Hedis loses connection to a redis cluster, if the redis cluster nodes change their IP addresses during a restart.
How to reproduce the problem
- Start haskell app which talks to redis in cluster mode
- Restart each redis node in a way that IP address of each node changes. This happens when using bitnami redis-cluster helm chart.
- See that the app cannot talk to redis anymore
While the app cannot talk to redis, we get these kinds of error messages:
service: Network.Socket.connect: <socket: 51>: does not exist (No route to host)
Impact
This makes using hedis in Kubernetes very difficult.
Speculation
This could be a result of using socketToHandle
here. The docs for the function say:
Caveat Handle is not recommended for network programming in Haskell, e.g. merely performing hClose on a TCP socket won't cooperate with peer's gracefulClose, i.e. proper shutdown sequence with appropriate handshakes specified by the protocol.
It could also be some other problem, this is just speculation.
Additional Info
I create ConnectInfo like this:
Redis.defaultConnectInfo
{ Redis.connectHost = "<kubernetes-service-name>"
Redis.connectPort = 6379,
Redis.connectTimeout = Just (secondsToNominalDiffTime 5),
Redis.connectMaxConnections = 100
}
And connect to redis like this:
Redis.connectCluster connInfo
This connection does not use TLS.
Please see https://github.com/informatikr/hedis/issues/184
@michaelglass, https://github.com/informatikr/hedis/issues/168 suggests you're using cluster support quite extensively. Is this something you could take a look into?
Need to run refreshShardMap
which would get the new cluster info (by invoking cluster info
call to redis). But in your case how will this query even run, you would need at least one node's static IP to run this query and get info which can be used to connect to all the nodes with new IPs.
PS: the current connections being held in pool might be still used and removed after only exceptions are thrown.
sorry about the late response. Maybe @stoeffel or @omnibs might have more context. I don't write much Haskell anymore 😔
For anyone else looking here, we have fork in juspay (https://github.com/juspay/hedis) with some of these fixes done already. Would like to upstream them in sometime.
@aravindgopall Glad to see you forked. Is it the fix/connection branch people are supposed to use? Why is it 104 commits behind? Would you be interested in taking over hedis maintainership?
Right now it is fix/connection branch, we are testing the changes in our setup. Once done will merge everything to cluster branch followed by master.
Why is it 104 commits behind? The cluster branch have all of the changes before being merge to upstream hedis (done by @alexjg), there is a PR waiting to make it up to date with upstream. This all might take a week or two to settle.
@ysangkok Would be very much interested on this.
JFI. fix/connection branch is up to date with upstream master and being tested. Once done will make merge to master and release.
JFI, cluster
branch have the latest code with the above fix and also some other fixes in Juspay Fork.
@ysangkok how do you want me to continue, shall i create a PR to upstream or we can continue on juspay fork.
JFI,
cluster
branch have the latest code with the above fix and also some other fixes in Juspay Fork.@ysangkok how do you want me to continue, shall i create a PR to upstream or we can continue on juspay fork.
I would suggest creating a PR to this repo and we can get it merged.
Right now it is fix/connection branch, we are testing the changes in our setup. Once done will merge everything to cluster branch followed by master.
Why is it 104 commits behind? The cluster branch have all of the changes before being merge to upstream hedis (done by @alexjg), there is a PR waiting to make it up to date with upstream. This all might take a week or two to settle.
@ysangkok Would be very much interested on this.
I mistakenly offered to maintain this library some months ago and it turns that I simply don't have the time. Could you email me and we can discuss handing over maintainership?
Hi @aravindgopall - I mistakenly offered to maintain this library some months ago and it turns that I simply don't have the time. Could you email me and we can discuss handing over maintainership? james at flipstone . com
Need to run
refreshShardMap
which would get the new cluster info (by invokingcluster info
call to redis). But in your case how will this query even run, you would need at least one node's static IP to run this query and get info which can be used to connect to all the nodes with new IPs.PS: the current connections being held in pool might be still used and removed after only exceptions are thrown.
This issue won't be solved by refreshing the shard map from IP addresses existing in the map: after Helm updated the Redis cluster, all Redis nodes will have new IP addresses assigned. That is, no IP address stored in the available shard map would connect to a then running Redis node. The only way to reconnect the client is by resolving a new valid IP address via the Redis cluster's DNS name and then obtain an entirely new shard map.
For our use-case, we retain the DNS name outside Hedis. This is suboptimal, since the DNS name is passed to Hedis for establishing the initial connection and could be retained for failovers when all IPs have become invalid. For instance, due to Helm being a villain.
@stefanwire This is a valid scenario. Right now it resolves this issue where at least one of the Redis node persisted the ip address. Generally helpful in more frequent scenarios like failovers, scaling etc..
Feel free to check and adapt the code linked in my previous comment to solve the issue where all IPs wouldn't connect anymore. Also, check my open PR which adds the CLUSTER INFO
command in order to improve robustness for Redis in cluster mode. We would be happy to use Hedis without local wrappers when this is done and merged.
I did a quick check on Juspay's branch with the following test scenario:
- AWS Elasticache Redis cluster, 1 shard, 2 nodes
- Launch program, issue some redis commands
- Trigger "failover primary" on the EC UI
- Try to issue any additional redis commands
It doesn't seem to work:
- Without Juspay's branch: timeout executing redis commands, forever
- With Juspay's branch:
NoNodeException
, forever
This means the shardMap
MVar
introduced in Juspay's branch somehow becomes empty during failover, and there's no recovery mechanism for when that happens.
Hi @omnibs , can you give a try with Juspay master branch and share the results. In our internal test it recovers without NoNodeException
. Please let us know if you need any more info.
With the master
branch it does recover from NoNodeException
. Thank you for the pointer!
It seemed to take several minutes, though (can't re-test bc I ran out of test failovers for today).
I don't follow in the code what's making it recover, but I'm curious if there's something we can tweak to make it recover faster.
it's retry mechanism based on MOVED/timeout (this can be configured to recover faster), when it occurs it will get the new shardMap info by picking a random node (again chances this might be down too can cause the time). Post this it will send the command to the new node which is responsible.
Would be great to see some numbers, graphs to understand this. (eg: re sharding time, throughput, how many errors etc..)
Thank you for explaining!
So I re-ran my test in a simple manner:
- I left
curl
on a loop, poking the health-check endpoint of our app in an environment every 2s- The health-check endpoint runs a
PING
throughhedis
, nothing more.
- The health-check endpoint runs a
- I triggered a failover
- I waited for things to normalize
It took a pretty long time:
- 15:12:24 -
good
- 15:12:26 -
Connection Lost
(failover happened) - 15:12:29 -
NoNodeException
- ...
- 15:21:04 -
good
(recovery)
So it took 8min30s to recover from NoNodeException
.
The failover itself is quick. I didn't time it, but restarting another instance of the app brought it back to life immediately, while the instance I was curling kept at NoNodeException
for several minutes after that.
it's retry mechanism based on MOVED/timeout (this can be configured to recover faster), when it occurs it will get the new shardMap info by picking a random node
Do my results make sense with this mechanism?
It sounds like after the failover my shardMap
was suddenly empty, and kept empty for a long time, so it couldn't have tried to pick a random node. But if this is correct, I don't understand what makes it recover after the 8min30s.
I plan to debug the mechanism further to figure out what's up, but any pointers you might have on what's happening would be appreciated!
ps: just to clarify, there's no resharding time, the failover works over a single shard
Ok, my test was too simplistic, and for Redis Cluster's intended usage, perhaps unrealistic.
Redis outside Elasticache requires at least 3 primaries for cluster mode. Elasticache allows you to set it up with a single primary.
I tested a local redis cluster with 3 primaries, ran a failover with debug segfault
and with manual failover, and I didn't even notice the failover. No errors.
I tested failing over an Elasticache cluster with 2 primaries, and all went well. We got a single NoNodeException
, which turned into a timeout after 500ms, and fully recovered after 12s, which is very reasonable.
@aravindgopall one last question then, does Juspay plan to upstream its fixes to cluster mode? I noticed there were 2 PRs for upstreaming them which got closed (#192 and #191).