lettuce icon indicating copy to clipboard operation
lettuce copied to clipboard

Provide a way to force the driver to reconnect

Open tishun opened this issue 1 year ago • 1 comments

Discussed in https://github.com/redis/lettuce/discussions/2870

Originally posted by e-ts June 3, 2024 Can I reconnect to the node used when I catch a RedisCommandTimeoutException for a command to a Redis Cluster?

We are having a problem where the old master does not respond for 10 seconds after a FAILOVER is issued to its replica. TCP packets with new requests still get acked during these 10 seconds. As the connection is clearly not dead, Lettuce keeps sending new commands to the old master. Eventually, it will receive all the MOVED response at once but this is too late for us.

For our specific problem, it would be better if Lettuce reconnected to the node on command timeout as the bug only seems to affect a single TCP socket. A command on a new socket will get an immediate MOVED response, allowing Lettuce to continue on the master.

I guess it could be tricky to get this right as all the requests in flight will time out at different times and we probably do not want to reconnect for each timeout.

Of course, we are trying to get the underling problem with Redis resolved too, see #2572 but a work-around like this would still be useful until that gets fixed.

I have checked the wiki, GitHub issues and GitHub Discussions and found #2082 which is similar but in that case, the TCP packets do not get acked, leading to another solution.

I tried setting an absurdly low periodic refresh of a few hundred milliseconds but that does not seem to help, which might be a bug but I have not looked into it yet.

tishun avatar Jul 09 '24 11:07 tishun

Suggested a solution in the discussion, waiting for user feedback.

tishun avatar Jul 17 '24 21:07 tishun

See https://github.com/redis/lettuce/discussions/2870#discussioncomment-10112534 for MRE

tishun avatar Jan 17 '25 14:01 tishun

Also in the discussion - a possible solution to the problem, should we decide to make it native to the driver

tishun avatar Apr 04 '25 16:04 tishun

After much deliberation I am going to close this issue for two main reasons:

  • the main reason we wanted such functionality in the first place is to solve the problem, that is already solved in the discussion
  • there is no other compelling argument to have it implemented.

Please reopen in case you disagree and provide the use case you want to address.

tishun avatar Apr 08 '25 10:04 tishun