lettuce
lettuce copied to clipboard
Topology refresh on consistent timeout
Bug Report
Current Behavior
While working with Lettuce against Redis cluster, when one of the nodes gets stuck, but doesn't crash, e.g. catching the process by gdb, the node doesn't reply, which leads to ops timeout. In this case, the node is considered as FAIL/PFAIL to the other nodes, but Lettuce has no idea about it. All the topology refresh option, the periodic and the adaptive don't contain a timeout issue. The closest adaptive trigger is the PERSISTENT_RECONNECTS, but In this case, the connection watchdog sees everything is ok as the tcp is in the kernel that keeps on buffering the data to the stuck Redis node.
I know timeouts can occur by many reasons, e.g. low command timeout with a huge key-value, or just unreasonable command timeout, but I think it's something that should be configurable.
Expected behavior/code
A topology refresh upon timeouts
Environment
- Lettuce version(s): 6.0.5.RELEASE
- Redis version: 6.2.5
Possible Solution
An option to trigger a topology refresh upon a timeout. To add a mechanism that counts the amount of timeouts in a configurable period of time and trigger an adaptive topology refresh if it exceeds.