jedis icon indicating copy to clipboard operation
jedis copied to clipboard

Idle connection terminations for Sentinel behind load balancer

Open dataviruset opened this issue 3 years ago • 2 comments

Expected behavior

Jedis sends periodic no-op/keepalive commands to sentinel to keep connection alive when sentinel is behind a proxy. Similar behavior as using setTestWhileIdle(true) for the Redis connection pool, but also for Sentinel.

Actual behavior

The load balancer terminates the connection ungracefully after its configured idle timeout (300 seconds in our case) causing error messages and reconnection in Jedis.

redis.clients.jedis.exceptions.JedisConnectionException: Unexpected end of stream.

Untitled

Steps to reproduce:

We deployed Redis + Redis Sentinel inside Kubernetes and exposed the Sentinel using an AWS CLB (Classic Load Balancer) and set its idle connection timeout to 300 seconds. We only configured one Sentinel address for Jedis to use and let AWS + Kubernetes handle the balancing to multiple nodes and sentinels.

Connecting directly to the Sentinel pods inside Kubernetes doesn't terminate the connection after 300 seconds, but it does when connecting via the AWS Load Balancer. The maximum idle connection timeout for classic load balancers in AWS seems to be 4000 seconds.

https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html

Redis / Jedis Configuration

Jedis version:

3.1.0

Redis version:

6.0.10

Java version:

1.8.0_172

dataviruset avatar Jan 29 '21 12:01 dataviruset

@dataviruset Do you think this is Jedis specific issue? Did you try tweaking commons-pool2 object pool config params?

sazzad16 avatar Feb 16 '21 08:02 sazzad16

The issue is inside JedisSentinelPool - it creates a PubSub connection in MasterListener, and that is what periodically fails. I have a similar issue where there is a firewall between application and Redis, and I get the stacktrace below each ~300 seconds.

ERROR redis.clients.jedis.JedisSentinelPool - Lost connection to Sentinel at .... Sleeping 5000ms and retrying.
redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: Connection reset
        at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:205)
        at redis.clients.jedis.util.RedisInputStream.readByte(RedisInputStream.java:43)
        at redis.clients.jedis.Protocol.process(Protocol.java:165)
        at redis.clients.jedis.Protocol.read(Protocol.java:230)
        at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:352)
        at redis.clients.jedis.Connection.getUnflushedObjectMultiBulkReply(Connection.java:314)
        at redis.clients.jedis.JedisPubSub.process(JedisPubSub.java:131)
        at redis.clients.jedis.JedisPubSub.proceed(JedisPubSub.java:125)
        at redis.clients.jedis.Jedis.subscribe(Jedis.java:3267)
        at redis.clients.jedis.JedisSentinelPool$MasterListener.run(JedisSentinelPool.java:402)
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:210)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.net.SocketInputStream.read(SocketInputStream.java:127)
        at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:199)

(btw, this "5000ms" part in error message is hardcoded)

Jedis 3.8, Redis 6.2.3, OpenJDK 1.8.0_312

piotrp avatar Feb 22 '22 12:02 piotrp

This issue is marked stale. It will be closed in 30 days if it is not updated.

github-actions[bot] avatar Jan 08 '24 00:01 github-actions[bot]