lettuce
lettuce copied to clipboard
Clients unable to recover Cluster failover issue when connecting with multiple Redis clusters
Bug Report
Current Behavior
We have the following setup. Client code creates two RedisClusterClient objects to connect with different Redis Clusters at the same time.
We are using 2 AWS Elasticache clustered mode Redis, 6 shards. 1 Replica node for each shard.
Now we initiate a shard failover in one of the shards of one of the two Redis Clusters. What we observe is that as soon as the failover is initiated the client application starts getting RedisCommandTimeoutException, which is expected. But it is unable to recover from these errors and they keep showing up in large numbers even after 15-20 minutes. The system recovers only after restarting the client process.
We have tested the exact same scenario with the client application connecting to a single Redis Cluster. In this case, the client is able to recover from the RedisCommandTimeoutExceptions within 1 minute of initiating the failover.
We are using following Redis commands in our setup: a. rpush b. lpop c. pexpire
I am providing the client code in the Input code section. Please note that while moving from a single Redis Cluster to multiple Redis clusters, no client code changes were made.
Input Code
Input Code
//startup code
RedisURI.Builder builder = RedisURI.Builder.redis(redisDetails.getHost(), redisDetails.getPort());
logger.info("Enable SSL");
builder=builder.withSsl(Boolean.TRUE);
builder=builder.withPassword(properties.getRedisPassword());
RedisURI redisURI = builder.withTimeout(Duration.ofSeconds(5)).build();
redisClusterClient = RedisClusterClient.create(redisURI);
if (redisClusterClient == null) {
logger.info("Could not create Redis connection.");
throw new Exception("Could not create redis connection.");
} else
logger.info("Redis connection created successfully.");
ClusterTopologyRefreshOptions topologyRefreshOptions = ClusterTopologyRefreshOptions.builder()
.enablePeriodicRefresh(Duration.ofSeconds(properties.getRedisTopologyRefreshInterval()))
.build();
// periodic refresh interval is set to 15 seconds
redisClusterClient.setOptions(ClusterClientOptions.builder()
.topologyRefreshOptions(topologyRefreshOptions)
.build());
GenericObjectPoolConfig poolConfig = new GenericObjectPoolConfig();
poolConfig.setMaxTotal(properties.getMaxRedisConnections());
poolConfig.setMaxIdle(properties.getMaxRedisConnections());
poolConfig.setMinIdle(properties.getMinRedisConnections());
GenericObjectPool<StatefulRedisClusterConnection<String, byte[]>> pool = ConnectionPoolSupport
.createGenericObjectPool(() -> redisClusterClient.connect(new StreamRedisCodec()), poolConfig);
//per request code
try {
StatefulRedisClusterConnection<String, byte[]> connection = pool.borrowObject();
connection.sync().rpush(......)
} catch(Exception ex){
throw new IOException(ex);
}
Expected behavior/code
Environment
- Lettuce version: 5.3.0.RELEASE
- Redis version: AWS elasticache Redis 5.0.6
Possible Solution
Additional context
Each RedisClusterClient
has its own topology refresh and set of connections it manages. You'd probably need to enable debug logs to trace topology updates and check the topology state for each client whether it reflects the most recent changes.