lettuce High Latencies for Redis Operation

Bug Report

There are high latencies for Redis Operations - EXPIRE for creating and fetching the session.

Current Behavior

Our project is based on Reactor, Kotlin Coroutines and Netty. We're using Spring Session Data Redis with LettuceClient for Redis related operations. Our APM tool DataDog show cases that while creating/fetching the session, EXPIRE ops have very high latencies ranging in minutes. Rest of the operations like HGETALL, DELETE, EXISTS and HMSET 99th percentile is within 100ms.

Our Redis Infra latencies looks good. So we suspect DataDog showing high latencies in spans could be due to thread waiting.

There is only one NIO thread created - lettuce-nioEventLoop-5-1

Redis Operations logs

[logLevel=DEBUG] -- 2023-04-06 17:54:06,868 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.RedisStateMachine -- Decode done, empty stack: true
[logLevel=INFO ] -- 2023-04-06 17:54:06,868 +0530 -- lettuce-nioEventLoop-5-1              com.apple.carrierservices.preauth.utils.LoggingCommandLatencyRecorder -- appServer=/AppServerIP, redisServer=redisServer, redisCommandType=EXPIRE, firstResponseLatency=277454292, completionLatency=277478875, TimeTaken=24583 nanoseconds
[logLevel=DEBUG] -- 2023-04-06 17:54:06,868 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Completing command LatencyMeteredCommand [type=EXPIRE, output=BooleanOutput [output=true, error='null'], commandType=io.lettuce.core.RedisPublisher$SubscriptionCommand]
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Received: 5 bytes, 6 commands in the stack
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Stack contains: 6 commands
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.RedisStateMachine -- Decode done, empty stack: true
[logLevel=INFO ] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              com.apple.carrierservices.preauth.utils.LoggingCommandLatencyRecorder -- appServer=/AppServerIP, redisServer=redisServer, redisCommandType=HMSET, firstResponseLatency=278740375, completionLatency=278768500, TimeTaken=28125 nanoseconds
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Completing command LatencyMeteredCommand [type=HMSET, output=StatusOutput [output=OK, error='null'], commandType=io.lettuce.core.RedisPublisher$SubscriptionCommand]
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Received: 4 bytes, 5 commands in the stack
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1              io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Stack contains: 5 commands

Session Flow

Any Session fetch is associated with below Redis Ops

HGETALL -> New Session -> DELETE -> save -> HMSET -> EXPIRE HGETALL -> Existing Session -> Expired -> DELETE -> save -> HMSET -> EXPIRE HGETALL -> Existing Session -> Non Expired -> EXISTS -> save -> HMSET -> EXPIRE

Expected behavior/code

The Redis operations latencies should be in admissible ranges, few Millis.

Environment

Redis Standalone, Version: 6.2.6
Lettuce version(s): 6.1.10.RELEASE
spring-session-data-redis:jar: 2.7.0
Redis version: 6.2.6
reactor-core:jar: 3.4.27
kotlinx-coroutines-reactor:jar: 1.6.4
reactor-netty-http:jar: 1.0.28
1 AWS Instance used 3Cores, 10GB

Spring Redis config in use

spring.redis.lettuce.pool.enabled=true
spring.redis.lettuce.pool.max-active=16
spring.redis.lettuce.pool.min-idle=8
lettuce.computation.threads.count=32

The issue is intermittent. It has been observed under very low load as well. The application recovers itself after some time and is able to serve high traffic usually.

Apr 06 '23 13:04 jakshay21

Using the reactive API with a single connection, the I/O event loop is being used as single thread for all subsequent operations (e.g. Jackson deserialization). You could try enabling ClientOptions.publishOnScheduler to check whether spreading load across multiple worker threads would help.

Apr 12 '23 07:04 mp911de

We've had ClientOptions.publishOnScheduler enabled.

We're using Snappy for compression/decompression which could be counted as one of a heavy workload. org.xerial.snappy:snappy-java:jar:1.1.9.1

One thing we noticed as part of connection pooling doc is we have an apace common-pool2 dependency which might not be relevant with Reactive APIs. So we are trying to simulate our work load again with it's removal and using Lettuce inbuilt implementations.

Apr 14 '23 06:04 jakshay21

Please run your workloads through a profiler to identify what is causing the slowdowns. Alternatively, use multiple connections through the AsyncPool that provides non-blocking connection pooling.

Apr 18 '23 07:04 mp911de

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 30 days this issue will be closed.

Apr 26 '24 00:04 github-actions[bot]