Intermittent RedisCommandTimeoutException in Lettuce Client
Hello,
We're experiencing random RedisCommandTimeoutException spikes in our production environment during Lettuce read operations (sync mget commands) against AWS ElastiCache replication group with 400ms timeout (latency generally around 5-10ms), SSL, and IAM auth enabled.
All metrics looks normal - no TPS spikes, connection pool is normal, thread utilization stable, auth connections refreshing properly(every 12 hrs), and ElastiCache GetTypeCmdsLatency shows normal latency. Timeouts occur sporadically without pattern and we are not observing this in our pre-prod environments. We don't have the lettuce debug logs enabled as they are too noisy.
Seeking guidance on potential causes and optimal Lettuce configuration to resolve these intermittent timeouts.
Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 400 millisecond(s)
at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:63)
at io.lettuce.core.internal.Futures.awaitOrCancel(Futures.java:233)
at io.lettuce.core.FutureSyncInvocationHandler.handleInvocation(FutureSyncInvocationHandler.java:79)
at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:84)
at jdk.proxy2/jdk.proxy2.$Proxy158.mget(Unknown Source)
at jdk.internal.reflect.GeneratedMethodAccessor134.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at io.lettuce.core.support.ConnectionWrapping$DelegateCloseToConnectionInvocationHandler.handleInvocation(ConnectionWrapping.java:200)
at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:84)
at jdk.proxy2/jdk.proxy2.$Proxy158.mget(Unknown Source)
// Connection Pool Config
GenericObjectPoolConfig<StatefulRedisConnection<String, String>> poolConfig =
new GenericObjectPoolConfig<>();
poolConfig.setMaxWait(500);
poolConfig.setBlockWhenExhausted(true);
poolConfig.setMaxTotal(128);
poolConfig.setMaxIdle(32);
poolConfig.setMinIdle(16);
poolConfig.setTestOnBorrow(true);
poolConfig.setTestWhileIdle(true);
poolConfig.setMinEvictableIdleDuration(Duration.ofSeconds(60));
poolConfig.setTimeBetweenEvictionRuns(Duration.ofSeconds(30));
// Redis URI Config
RedisURI.builder()
.withHost(endpoint)
.withPort(port)
.withTimeout(Duration.ofMillis(500))
.withSsl(true)
.withVerifyPeer(false)
.withAuthentication(iamAuthCredentialsProvider)
// Client Options
ClientOptions.builder()
.autoReconnect(true)
.disconnectedBehavior(ClientOptions.DisconnectedBehavior.REJECT_COMMANDS)
// Client Resources
DefaultClientResources.builder()
.ioThreadPoolSize(16)
.computationThreadPoolSize(16)
Hey @priyavaddineni , We are going through your issue In the meantime could you please provide your Redis and Lettuce versions ?
Thank you for looking into it!
Redis engine version: 7.0.7 Lettuce version: 6.4
Hey @priyavaddineni ,
unfortunately timeout exceptions are notoriously hard to track.
One think that I could not find in your analysis is - how much resources (CPU/threads) are allocated to the driver and does it use them all (have you checked for CPU spikes on the side of the client)?
Sometimes when the driver is overloaded and has no free resources to process the incoming replies from the server it would slow down and the commands that are waiting to be processed will timeout.
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 2 weeks this issue will be closed.