High Latencies for Redis Operation
Bug Report
There are high latencies for Redis Operations - EXPIRE for creating and fetching the session.
Current Behavior
Our project is based on Reactor, Kotlin Coroutines and Netty. We're using Spring Session Data Redis with LettuceClient for Redis related operations. Our APM tool DataDog show cases that while creating/fetching the session, EXPIRE ops have very high latencies ranging in minutes. Rest of the operations like HGETALL, DELETE, EXISTS and HMSET 99th percentile is within 100ms.
Our Redis Infra latencies looks good. So we suspect DataDog showing high latencies in spans could be due to thread waiting.
There is only one NIO thread created - lettuce-nioEventLoop-5-1
Redis Operations logs
[logLevel=DEBUG] -- 2023-04-06 17:54:06,868 +0530 -- lettuce-nioEventLoop-5-1 io.lettuce.core.protocol.RedisStateMachine -- Decode done, empty stack: true
[logLevel=INFO ] -- 2023-04-06 17:54:06,868 +0530 -- lettuce-nioEventLoop-5-1 com.apple.carrierservices.preauth.utils.LoggingCommandLatencyRecorder -- appServer=/AppServerIP, redisServer=redisServer, redisCommandType=EXPIRE, firstResponseLatency=277454292, completionLatency=277478875, TimeTaken=24583 nanoseconds
[logLevel=DEBUG] -- 2023-04-06 17:54:06,868 +0530 -- lettuce-nioEventLoop-5-1 io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Completing command LatencyMeteredCommand [type=EXPIRE, output=BooleanOutput [output=true, error='null'], commandType=io.lettuce.core.RedisPublisher$SubscriptionCommand]
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1 io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Received: 5 bytes, 6 commands in the stack
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1 io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Stack contains: 6 commands
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1 io.lettuce.core.protocol.RedisStateMachine -- Decode done, empty stack: true
[logLevel=INFO ] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1 com.apple.carrierservices.preauth.utils.LoggingCommandLatencyRecorder -- appServer=/AppServerIP, redisServer=redisServer, redisCommandType=HMSET, firstResponseLatency=278740375, completionLatency=278768500, TimeTaken=28125 nanoseconds
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1 io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Completing command LatencyMeteredCommand [type=HMSET, output=StatusOutput [output=OK, error='null'], commandType=io.lettuce.core.RedisPublisher$SubscriptionCommand]
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1 io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Received: 4 bytes, 5 commands in the stack
[logLevel=DEBUG] -- 2023-04-06 17:54:06,873 +0530 -- lettuce-nioEventLoop-5-1 io.lettuce.core.protocol.CommandHandler -- [channel=0x15b747b9, /AppServerIP -> redisServer, epid=0x1, chid=0x1] Stack contains: 5 commands
Session Flow
Any Session fetch is associated with below Redis Ops
HGETALL -> New Session -> DELETE -> save -> HMSET -> EXPIRE
HGETALL -> Existing Session -> Expired -> DELETE -> save -> HMSET -> EXPIRE
HGETALL -> Existing Session -> Non Expired -> EXISTS -> save -> HMSET -> EXPIRE
Expected behavior/code
The Redis operations latencies should be in admissible ranges, few Millis.
Environment
- Redis Standalone, Version: 6.2.6
- Lettuce version(s): 6.1.10.RELEASE
- spring-session-data-redis:jar: 2.7.0
- Redis version: 6.2.6
- reactor-core:jar: 3.4.27
- kotlinx-coroutines-reactor:jar: 1.6.4
- reactor-netty-http:jar: 1.0.28
- 1 AWS Instance used 3Cores, 10GB
Spring Redis config in use
spring.redis.lettuce.pool.enabled=true
spring.redis.lettuce.pool.max-active=16
spring.redis.lettuce.pool.min-idle=8
lettuce.computation.threads.count=32
The issue is intermittent. It has been observed under very low load as well. The application recovers itself after some time and is able to serve high traffic usually.
Using the reactive API with a single connection, the I/O event loop is being used as single thread for all subsequent operations (e.g. Jackson deserialization). You could try enabling ClientOptions.publishOnScheduler to check whether spreading load across multiple worker threads would help.
We've had ClientOptions.publishOnScheduler enabled.
We're using Snappy for compression/decompression which could be counted as one of a heavy workload.
org.xerial.snappy:snappy-java:jar:1.1.9.1
One thing we noticed as part of connection pooling doc is we have an apace common-pool2 dependency which might not be relevant with Reactive APIs. So we are trying to simulate our work load again with it's removal and using Lettuce inbuilt implementations.
Please run your workloads through a profiler to identify what is causing the slowdowns. Alternatively, use multiple connections through the AsyncPool that provides non-blocking connection pooling.
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 30 days this issue will be closed.