lettuce icon indicating copy to clipboard operation
lettuce copied to clipboard

None recovering Redis command timed out

Open mshahmoradi87 opened this issue 4 months ago • 0 comments

Bug Report

One container starts to have "Redis command timed out" and it does not recover.

Current Behavior

we have started to use aws redis serveless, since the switch from redis cluster-mode enabled to redis serverless, out of the blue one client from many tasks running starts to have command timed out continuously. Depending on the service and size of objects and if put or get commands are being timed out, we see different side effects, sometimes it results in https://github.com/redis/lettuce/issues/705, sometimes only increased latency.

Note that in the services this happens, usually there are more than 12 tasks running, and only one container has this issue.

I could correlate any CPU spike with the event causing the timeouts.

org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 5 second(s)
	at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70)
	at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41)
	at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
	at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
	at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:277)
	at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1085)
	at org.springframework.data.redis.connection.lettuce.LettuceConnection.lambda$doInvoke$4(LettuceConnection.java:938)
	at org.springframework.data.redis.connection.lettuce.LettuceInvoker$Synchronizer.invoke(LettuceInvoker.java:665)
	at org.springframework.data.redis.connection.lettuce.LettuceInvoker.just(LettuceInvoker.java:125)
	at org.springframework.data.redis.connection.lettuce.LettuceHashCommands.hSet(LettuceHashCommands.java:61)
	at org.springframework.data.redis.connection.DefaultedRedisConnection.hSet(DefaultedRedisConnection.java:1332)
	at org.springframework.data.redis.connection.DefaultStringRedisConnection.hSet(DefaultStringRedisConnection.java:631)
	at org.springframework.data.redis.core.DefaultHashOperations.lambda$put$14(DefaultHashOperations.java:254)
	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224)
	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:191)
	at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:97)
	at org.springframework.data.redis.core.DefaultHashOperations.put(DefaultHashOperations.java:253)



	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 5 second(s)
	at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:59)
	at io.lettuce.core.internal.Futures.awaitOrCancel(Futures.java:246)
	at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:74)
	at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1083)
	... 22 common frames omitted`

Expected behavior/code

the expected behaviour is that with periodic_refresh, the issue does not persist.

Environment

aws redis serverless 7.1 <lettuce.version>6.3.2.RELEASE</lettuce.version> <netty.version>4.1.109.Final</netty.version>

related configurations: ssl: true timeout: 5s connect-timeout: 500ms additional: dns-ttl: 5s periodic-refresh: 10s reconnect-delay-min: 100ms reconnect-delay-max: 5s read-from: replicaPreferred

Possible Solution

RedisCommandTimeoutException intermittently: in some cases this could be related, as we sometimes see timeouts for a minute, and they are recovered, however in this case it persists.

Lettuce cannot recover from connection problems: maybe setting TCP_USER_TIMEOUT would help, default is 1 minutes, so for the recovering case, maybe this helps, but for the cases that this continues and gets stacked probably not.

mshahmoradi87 avatar Oct 11 '24 12:10 mshahmoradi87