spring-data-redis icon indicating copy to clipboard operation
spring-data-redis copied to clipboard

Connection leak while using `execute(SessionCallback)` and `executeWithStickyConnection`

Open HashJang opened this issue 3 years ago • 2 comments

We use spring-data-redis with lettuce as the redis connection lib.

The version is: spring-data-redis-2.4.9.jar.

Recently we found a lot of BLOCKING in production, all threads of servlet thread pool are blocked waiting for connection from connection pool. Then we removed connection pool conf and it goes well.

After tracking and analysing the problem, we found there is a Lettuce Connection Leak in spring-data-redis while using execute(SessionCallback) and executeWithStickyConnection(SessionCallback) in same thread by random turn. For example:

//step1
redisTemplate.execute(new SessionCallback<Map<String, Integer>>() {
    @Override
    public Map<String, Integer> execute(RedisOperations redisOperations) throws DataAccessException {
        Cursor<String> cursor = redisTemplate.opsForSet().scan("TEST_KEY",
                new ScanOptions.ScanOptionsBuilder().match("*").count(1000).build());
        try (cursor) {
            while (cursor.hasNext()) {
                String next = cursor.next();
                String value = next == null ? "" : next;
            }
        } catch (IOException e) {
        }
        return null;
    }
});

//step2
redisTemplate.executePipelined(new SessionCallback<Object>() {
    @Override
    public <K, V> Object execute(RedisOperations<K, V> operations) throws DataAccessException {
        redisTemplate.hasKey("testDefault" );
        redisTemplate.hasKey("testSecond");
        return null;
    }
});

//step3
redisTemplate.execute(new SessionCallback<Map<String, Integer>>() {
    @Override
    public Map<String, Integer> execute(RedisOperations redisOperations) throws DataAccessException {
        Cursor<String> cursor = redisTemplate.opsForSet().scan("TEST_KEY",
                new ScanOptions.ScanOptionsBuilder().match("*").count(1000).build());
        try (cursor) {
            while (cursor.hasNext()) {
                String next = cursor.next();
                String value = next == null ? "" : next;
            }
        } catch (IOException e) {
        }
        return null;
    }
});

As you can see in the code above:

  • step1: Use redistemplate to execute a sscan. sscan is implemented by executeWithStickyConnection by the way. This would cause TransactionSynchronizationManager bind a LettuceConnection (contains a asyncSharedConn which is a share connection) to current thread and not released(unbind) because the referenceCount is not zero. Both execute and sscan cause referenceCount + 1 but only execute trigger referenceCount minus 1 at the end. As cursor is closed finally after try block, cursor close will cause LettuceConnection close, this would set isClosed = true
  • step2: Use redistemplate to executePipeline. As in step1, current thread is bound to a LettuceConnection and this time would use this LettuceConnection to execute pipeline commands. As pipeline commands should use Dedicated connection, fetch a new Dedicated connection and put in LettuceConnection (cantains a asyncSharedConn and a asyncDedicatedConn now). After executePipeline, the LettuceConnection is not released because referenceCount is still not zero.
  • step3: Use redistemplate to execute a sscan again. As cursor is closed finally after try block, cursor close will cause connection close. But LettuceConnection's isClosed is already true this time, nothing will be done according to the source of below. And asyncDedicatedConn would never be returned back:

LettuceConnection.java

@Override
public void close() throws DataAccessException {

super.close();

if (isClosed) {
	return;
}

isClosed = true;

if (asyncDedicatedConn != null) {
	try {
		if (customizedDatabaseIndex()) {
			potentiallySelectDatabase(defaultDbIndex);
		}
		connectionProvider.release(asyncDedicatedConn);
	} catch (RuntimeException ex) {
		throw convertLettuceAccessException(ex);
	}
}

if (subscription != null) {
	if (subscription.isAlive()) {
		subscription.doClose();
	}
	subscription = null;
}

this.dbIndex = defaultDbIndex;
}

Never using scan within execute(sessioncallback) would help, but is this a bug or drawback of design?

HashJang avatar Aug 24 '21 13:08 HashJang

Thanks for reporting the issue. I can confirm that there's an issue that causes a leak when using nested doIn callbacks. Specifically, executeWithStickyConnection doesn't play well when being called from a SessionCallback. For the time being, please move all SCAN calls to outside of execute(…) methods.

The problem with executeWithStickyConnection is that we assume that executeWithStickyConnection is being the top-level entry method and that the connection is closed externally. Additionally, there's no mechanism to unbind the sticky connection from executeWithStickyConnection. We ideally would need way to decorate the connection that when close is being called, the connection unbinds itself from the thread.

mp911de avatar Aug 30 '21 07:08 mp911de

In which version was this bug introduced? I'm looking for the latest version without the issue.

streetcleaner avatar Nov 16 '21 06:11 streetcleaner