spring-data-redis icon indicating copy to clipboard operation
spring-data-redis copied to clipboard

LettuceConnectionProvider getConnection hang forever

Open heoYH opened this issue 3 years ago • 8 comments

We use spring-data-redis and lettuce to access the redis cluster.

I don't know the cause, but there was a problem that the connection could not be initialization complete As a result, sharedConnection could not be inited and fell into a waiting state forever And after that, all requests became blocking.

###thread dump

"lettuce-epollEventLoop-5-6" tid=0x3f native=false suspended=false
   java.lang.Thread.State: WAITING
	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
	at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
	at java.util.concurrent.CompletableFuture$Signaller.block([email protected]/CompletableFuture.java:1796)
	at java.util.concurrent.ForkJoinPool.managedBlock([email protected]/ForkJoinPool.java:3128)
	at java.util.concurrent.CompletableFuture.waitingGet([email protected]/CompletableFuture.java:1823)
	at java.util.concurrent.CompletableFuture.get([email protected]/CompletableFuture.java:1998)
	at io.lettuce.core.cluster.RedisClusterClient.get(RedisClusterClient.java:937)
	at io.lettuce.core.cluster.RedisClusterClient.getPartitions(RedisClusterClient.java:329)
	at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:92)
	at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnectionAsync(ClusterConnectionProvider.java:40)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionProvider.getConnection(LettuceConnectionProvider.java:53)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.getConnection(LettuceConnectionFactory.java:1527)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getNativeConnection(LettuceConnectionFactory.java:1315)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1298)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedReactiveConnection(LettuceConnectionFactory.java:1049)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveClusterConnection(LettuceConnectionFactory.java:481)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:457)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:101)
	at org.springframework.data.redis.core.ReactiveRedisTemplate.lambda$doInConnection$0(ReactiveRedisTemplate.java:198)
	at org.springframework.data.redis.core.ReactiveRedisTemplate$$Lambda$1725/0x0000000100cadc40.get(Unknown Source)
	at reactor.core.publisher.MonoSupplier.call(MonoSupplier.java:85)
"lettuce-epollEventLoop-5-5" tid=0x3e native=false suspended=false
   java.lang.Thread.State: BLOCKED
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1297)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getSharedReactiveConnection(LettuceConnectionFactory.java:1049)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveClusterConnection(LettuceConnectionFactory.java:481)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:457)
	at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getReactiveConnection(LettuceConnectionFactory.java:101)
	at org.springframework.data.redis.core.ReactiveRedisTemplate.lambda$doInConnection$0(ReactiveRedisTemplate.java:198)
	at org.springframework.data.redis.core.ReactiveRedisTemplate$$Lambda$1725/0x0000000100cadc40.get(Unknown Source)
	at reactor.core.publisher.MonoSupplier.call(MonoSupplier.java:85)
	at reactor.core.publisher.FluxUsingWhen.subscribe(FluxUsingWhen.java:80)
	at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:236)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:203)
	at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onComplete(ScopePassingSpanSubscriber.java:102)
	at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onComplete(FluxSwitchIfEmpty.java:84)
	at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onComplete(ScopePassingSpanSubscriber.java:102)
	at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onComplete(MonoIgnoreElements.java:88)
	at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onComplete(MonoIgnoreElements.java:88)
	at reactor.core.publisher.Operators.complete(Operators.java:136)

We are looking for the cause of the connection failing to connect. My guess is that getConnection should have a timeout. https://github.com/spring-projects/spring-data-redis/blob/d2cae7528b84647b3b9fad266ab9f8245b7aa3ed/src/main/java/org/springframework/data/redis/connection/lettuce/LettuceConnectionProvider.java#L53 https://github.com/spring-projects/spring-data-redis/blob/d2cae7528b84647b3b9fad266ab9f8245b7aa3ed/src/main/java/org/springframework/data/redis/connection/lettuce/LettuceFutureUtils.java#L68 This is because it should not "hang" when sharedConnection init fails for various reasons.

heoYH avatar Mar 29 '22 08:03 heoYH

This is a known issue and we plan to address it. Meanwhile, you can enable eager connection initialization to initialize the connection early on.

mp911de avatar Mar 29 '22 10:03 mp911de

@mp911de Could you please explain this issue if possible? (I wonder what is causing the problem.)

heoYH avatar Mar 29 '22 10:03 heoYH

Sure. SharedConnection synchronizes access to a single shared connection to ensure that we create only a single connection. With one or more event loop threads being blocked, they wait both for completion. They cannot complete because they wait for each other and then you end up with a sort of dead lock.

mp911de avatar Mar 29 '22 11:03 mp911de

Any updates on this? I'm experiencing what I believe to be the same/similar problem and I'm unable to send any commands through the shared connection. I've tried enabling eager initialization to no avail.

TomHaughton avatar Mar 15 '23 14:03 TomHaughton

@mp911de is there any configuration to avoid this BLOCKED? or only restart redis?

123jiehao avatar Nov 28 '23 12:11 123jiehao

Proposed workaround works.

To enable eager connection: public void setEagerInitialization(boolean eagerInitialization) have to be set to true before afterPropertiesSet()

Example:

LettuceConnectionFactory connectionFactory = new LettuceConnectionFactory(redisConfig, clientConfig);
connectionFactory.setEagerInitialization(true);
connectionFactory.afterPropertiesSet();

remnov avatar Feb 22 '24 07:02 remnov