spring-data-redis
spring-data-redis copied to clipboard
Connection leak while using `execute(SessionCallback)` and `executeWithStickyConnection`
We use spring-data-redis with lettuce as the redis connection lib.
The version is: spring-data-redis-2.4.9.jar.
Recently we found a lot of BLOCKING in production, all threads of servlet thread pool are blocked waiting for connection from connection pool. Then we removed connection pool conf and it goes well.
After tracking and analysing the problem, we found there is a Lettuce Connection Leak in spring-data-redis while using execute(SessionCallback) and executeWithStickyConnection(SessionCallback) in same thread by random turn. For example:
//step1
redisTemplate.execute(new SessionCallback<Map<String, Integer>>() {
@Override
public Map<String, Integer> execute(RedisOperations redisOperations) throws DataAccessException {
Cursor<String> cursor = redisTemplate.opsForSet().scan("TEST_KEY",
new ScanOptions.ScanOptionsBuilder().match("*").count(1000).build());
try (cursor) {
while (cursor.hasNext()) {
String next = cursor.next();
String value = next == null ? "" : next;
}
} catch (IOException e) {
}
return null;
}
});
//step2
redisTemplate.executePipelined(new SessionCallback<Object>() {
@Override
public <K, V> Object execute(RedisOperations<K, V> operations) throws DataAccessException {
redisTemplate.hasKey("testDefault" );
redisTemplate.hasKey("testSecond");
return null;
}
});
//step3
redisTemplate.execute(new SessionCallback<Map<String, Integer>>() {
@Override
public Map<String, Integer> execute(RedisOperations redisOperations) throws DataAccessException {
Cursor<String> cursor = redisTemplate.opsForSet().scan("TEST_KEY",
new ScanOptions.ScanOptionsBuilder().match("*").count(1000).build());
try (cursor) {
while (cursor.hasNext()) {
String next = cursor.next();
String value = next == null ? "" : next;
}
} catch (IOException e) {
}
return null;
}
});
As you can see in the code above:
-
step1: Use redistemplate to execute a sscan. sscan is implemented by executeWithStickyConnection by the way. This would cause
TransactionSynchronizationManager
bind aLettuceConnection
(contains aasyncSharedConn
which is a share connection) to current thread and not released(unbind) because thereferenceCount
is not zero. Both execute and sscan causereferenceCount
+ 1 but only execute triggerreferenceCount
minus 1 at the end. As cursor is closed finally after try block, cursor close will causeLettuceConnection
close, this would setisClosed = true
-
step2: Use redistemplate to executePipeline. As in step1, current thread is bound to a
LettuceConnection
and this time would use thisLettuceConnection
to execute pipeline commands. As pipeline commands should use Dedicated connection, fetch a new Dedicated connection and put inLettuceConnection
(cantains aasyncSharedConn
and aasyncDedicatedConn
now). After executePipeline, theLettuceConnection
is not released becausereferenceCount
is still not zero. -
step3: Use redistemplate to execute a sscan again. As cursor is closed finally after try block, cursor close will cause connection close. But
LettuceConnection
'sisClosed
is already true this time, nothing will be done according to the source of below. AndasyncDedicatedConn
would never be returned back:
@Override
public void close() throws DataAccessException {
super.close();
if (isClosed) {
return;
}
isClosed = true;
if (asyncDedicatedConn != null) {
try {
if (customizedDatabaseIndex()) {
potentiallySelectDatabase(defaultDbIndex);
}
connectionProvider.release(asyncDedicatedConn);
} catch (RuntimeException ex) {
throw convertLettuceAccessException(ex);
}
}
if (subscription != null) {
if (subscription.isAlive()) {
subscription.doClose();
}
subscription = null;
}
this.dbIndex = defaultDbIndex;
}
Never using scan within execute(sessioncallback) would help, but is this a bug or drawback of design?
Thanks for reporting the issue. I can confirm that there's an issue that causes a leak when using nested doIn
callbacks. Specifically, executeWithStickyConnection
doesn't play well when being called from a SessionCallback
. For the time being, please move all SCAN
calls to outside of execute(…)
methods.
The problem with executeWithStickyConnection
is that we assume that executeWithStickyConnection
is being the top-level entry method and that the connection is closed externally. Additionally, there's no mechanism to unbind the sticky connection from executeWithStickyConnection
. We ideally would need way to decorate the connection that when close
is being called, the connection unbinds itself from the thread.
In which version was this bug introduced? I'm looking for the latest version without the issue.