jedis icon indicating copy to clipboard operation
jedis copied to clipboard

JedisCluster behavior during cluster failure

Open uromahn opened this issue 8 years ago • 10 comments

This is more a question and discussion than an issue!

Observed behavior

I am running my JedisCluster test client against my Jedis Cluster. The connection uses default settings (5 maxAttempts, 2000ms timeouts). JedisCluster is passed the three known masters for initial connection. JedisCluster is able to correctly determine the complete cluster topology (six nodes with 3 master and 3 slaves).

For my test, I am generating 1,000,000 random KV-pairs. The keys are numbers between 1 and 1,000,000 converted as Strings and the values are random strings with a length of 100 characters.

I then write those 1,000,000 KV-pairs to my Redis cluster. In the second round, I read back those 1,000,000 KV-pairs and repeat the read cycle a second time.

When I fail my cluster during the write or read operation by "killing" one of the Docker container with a master node, I see that Redis cluster goes into failed state. After bout 700ms the corresponding slave will get elected new master and the cluster state gets changed to "OK" after about 900ms.

During the failed state, my jedisCluster.set() or jedisCluster.get() operation throws a JedisException (a JedisConnectionException to be exact) which I catch to not fail my application. I go then into a retry loop retrying the last set() or get() operation until it succeeds (or I reach the maximum retries which causes my test client to abort). With the default settings of "maxAttempts" the set() or get() operation resume successfully after about 2,000 - 3,000 ms.

Here is my question: why does Jedis take at least twice as long to detect that the cluster state is OK again? Shouldn't this immediately happen after the approx. 900ms once the cluster state changes back to "OK"?

If I restart my failed node immediately after the failure (during the time my client reports failure), set() / get() operation resume immediately. Is this due to the fact that the Redis Cluster did not elect the slave from the failed node as new master and the re-started node will resume as master?

What I would like to observe

Ideally, the fail-over and resumption of the operations should happen transparently to the client. In other words, my application using Jedis should never get a JedisException until a configured timeout. Ideally the JedisCluster should silently re-try the current operation (set(), get(), etc.) until either it was successful or it failed "permanently" (until a set timeout happens).

But, maybe that is what it can do already and I simply have not configured my JedisCluster correctly.

Steps to reproduce:

Please create a reproducible case of your problem. Make sure that case repeats consistently and it's not random

  1. start your redis cluster
  2. start the Jedis test client
  3. kill one of the Docker container with a Redis master

Redis / Jedis Configuration

Redis Cluster with 6 nodes: 3 master and 3 slaves Redis Cluster is running inside Docker containers in Docker Swarm. The Jedis client program is also running in a Docker container inside the same Swarm cluster. Jedis client has been developed using a simple Spring Boot application.

Jedis version:

2.8.2

Redis version:

3.2.3

Java version:

Java 1.8.0_101 (Oracle)

uromahn avatar Sep 22 '16 13:09 uromahn

@uromahn just for curiosity, what do you see in your app between redis-cluster reports OK and your app becomes stable again?. Basically between the 900ms and the ~2000-3000, Jedis still throws exceptions?

marcosnils avatar Sep 27 '16 00:09 marcosnils

In the time after the cluster "recovered" Jedis continues to throw an exception every time a put or get is attempted. I tried different things to recover faster but nothing worked, except waiting for Jedis to resume working.

uromahn avatar Sep 27 '16 05:09 uromahn

@uromahn can you retry the test and configure JedisCluster with maxAttempts set in 1 to see if that speeds up Jedis recovery? I'm just guessing here for what I remember when I coded it. I'll try to replicate the scenario some time this week to provide better feedback.

marcosnils avatar Sep 27 '16 05:09 marcosnils

@marcosnils I tried that already and in fact it made the situation worse. In this case Jedis did not recover before my own coded "maxWait" appeared.

uromahn avatar Sep 27 '16 06:09 uromahn

@uromahn perfect. Can you please share your main benchmark program so I can reproduce?

marcosnils avatar Sep 27 '16 14:09 marcosnils

@marcosnils Here is the gist with my test program: https://gist.github.com/uromahn/512d6a2279284421044ffe6ed63d40ba You will have to create a new Maven project with the correct folder structure and place those files accordingly. If that is too cumbersome, I can also package my project and post it somewhere as tar.gz file. P.S. Sorry for the "hacky" code - it was just a "quick and dirty" test. :)

uromahn avatar Sep 28 '16 08:09 uromahn

@uromahn For time configuration you can go with sentinel where you can configure the failover timeout but sentinel only works in replication mode not in cluster mode.

You can also configure cluster-node-timeout in your redis.conf file to avoid long failover time if your master node is down.

But in some cases it helpful like Network spike of some few second prevent failover of your redis cluster.

vishalsharma13 avatar Oct 08 '16 10:10 vishalsharma13

@uromahn I've been busy the past weeks. I'll try to provide some insights within this week. Thx for your patience.

marcosnils avatar Oct 11 '16 23:10 marcosnils

Transparent fail-over and resumption: #2358

walles avatar Feb 01 '21 13:02 walles

This issue is marked stale. It will be closed in 30 days if it is not updated.

github-actions[bot] avatar Feb 15 '24 00:02 github-actions[bot]