Connection leak while using `execute(SessionCallback)` and `executeWithStickyConnection`
We use spring-data-redis with lettuce as the redis connection lib.
The version is: spring-data-redis-2.4.9.jar.
Recently we found a lot of BLOCKING in production, all threads of servlet thread pool are blocked waiting for connection from connection pool. Then we removed connection pool conf and it goes well.
After tracking and analysing the problem, we found there is a Lettuce Connection Leak in spring-data-redis while using execute(SessionCallback) and executeWithStickyConnection(SessionCallback) in same thread by random turn. For example:
//step1
redisTemplate.execute(new SessionCallback<Map<String, Integer>>() {
@Override
public Map<String, Integer> execute(RedisOperations redisOperations) throws DataAccessException {
Cursor<String> cursor = redisTemplate.opsForSet().scan("TEST_KEY",
new ScanOptions.ScanOptionsBuilder().match("*").count(1000).build());
try (cursor) {
while (cursor.hasNext()) {
String next = cursor.next();
String value = next == null ? "" : next;
}
} catch (IOException e) {
}
return null;
}
});
//step2
redisTemplate.executePipelined(new SessionCallback<Object>() {
@Override
public <K, V> Object execute(RedisOperations<K, V> operations) throws DataAccessException {
redisTemplate.hasKey("testDefault" );
redisTemplate.hasKey("testSecond");
return null;
}
});
//step3
redisTemplate.execute(new SessionCallback<Map<String, Integer>>() {
@Override
public Map<String, Integer> execute(RedisOperations redisOperations) throws DataAccessException {
Cursor<String> cursor = redisTemplate.opsForSet().scan("TEST_KEY",
new ScanOptions.ScanOptionsBuilder().match("*").count(1000).build());
try (cursor) {
while (cursor.hasNext()) {
String next = cursor.next();
String value = next == null ? "" : next;
}
} catch (IOException e) {
}
return null;
}
});
As you can see in the code above:
- step1: Use redistemplate to execute a sscan. sscan is implemented by executeWithStickyConnection by the way. This would cause
TransactionSynchronizationManagerbind aLettuceConnection(contains aasyncSharedConnwhich is a share connection) to current thread and not released(unbind) because thereferenceCountis not zero. Both execute and sscan causereferenceCount+ 1 but only execute triggerreferenceCountminus 1 at the end. As cursor is closed finally after try block, cursor close will causeLettuceConnectionclose, this would setisClosed = true - step2: Use redistemplate to executePipeline. As in step1, current thread is bound to a
LettuceConnectionand this time would use thisLettuceConnectionto execute pipeline commands. As pipeline commands should use Dedicated connection, fetch a new Dedicated connection and put inLettuceConnection(cantains aasyncSharedConnand aasyncDedicatedConnnow). After executePipeline, theLettuceConnectionis not released becausereferenceCountis still not zero. - step3: Use redistemplate to execute a sscan again. As cursor is closed finally after try block, cursor close will cause connection close. But
LettuceConnection'sisClosedis already true this time, nothing will be done according to the source of below. AndasyncDedicatedConnwould never be returned back:
@Override
public void close() throws DataAccessException {
super.close();
if (isClosed) {
return;
}
isClosed = true;
if (asyncDedicatedConn != null) {
try {
if (customizedDatabaseIndex()) {
potentiallySelectDatabase(defaultDbIndex);
}
connectionProvider.release(asyncDedicatedConn);
} catch (RuntimeException ex) {
throw convertLettuceAccessException(ex);
}
}
if (subscription != null) {
if (subscription.isAlive()) {
subscription.doClose();
}
subscription = null;
}
this.dbIndex = defaultDbIndex;
}
Never using scan within execute(sessioncallback) would help, but is this a bug or drawback of design?
Thanks for reporting the issue. I can confirm that there's an issue that causes a leak when using nested doIn callbacks. Specifically, executeWithStickyConnection doesn't play well when being called from a SessionCallback. For the time being, please move all SCAN calls to outside of execute(…) methods.
The problem with executeWithStickyConnection is that we assume that executeWithStickyConnection is being the top-level entry method and that the connection is closed externally. Additionally, there's no mechanism to unbind the sticky connection from executeWithStickyConnection. We ideally would need way to decorate the connection that when close is being called, the connection unbinds itself from the thread.
In which version was this bug introduced? I'm looking for the latest version without the issue.