jedis icon indicating copy to clipboard operation
jedis copied to clipboard

"redis.clients.jedis.exceptions.JedisConnectionException: Failed to create socket." exception while connecting redis cluster

Open harsha-vardhana opened this issue 2 years ago • 2 comments

Expected behavior

There should be no changes in response time or any error encountered irrespective of the master node used while instantiating JedisCluster in jedis client.

Actual behavior

When connecting to a redis cluster using jedis, giving different master node's IP addresses while instantiating JedisCluster results in different behavior. Please see the structure of the cluster below and refer the java program below.

Steps to reproduce:

  1. Create a redis cluster of 6 nodes spread across 2 different hosts with below configuration. All are independent processes (not docker). No password or "bind" set in redis.conf.

  2. Run the below java program with below different scenarios:

  3. Run the java program (without timeout and max attempts) outside the hosts where you have redis nodes running.

    Connect to host "10.250.94.231:7003" (parameter for JedisCluster instantiation). Only 2 keys added and read ("/bootstrap/user/abc/10" and "/bootstrap/user/abc/11"). Next value doesn't get added and fails with this exception:

    redis.clients.jedis.exceptions.JedisClusterOperationException: Cluster retry deadline exceeded.

    and when I debug through jedis code, it actually fails with below exception:

    redis.clients.jedis.exceptions.JedisConnectionException: Failed to create socket.

    After adding the timeout and max attempts as shown below in the program, the rest of the values got added but after huge delay in adding the third value. This value gets added in a different node from the first 2 values, but why such a delay?

  4. Run the java program connecting to another master node, "10.250.94.231:7003". No key values are added at all without timeout and max attempts.

  5. Run the program with another master in a different host, "10.248.88.85:7003", there is no delay and all the values are set in a jiffy!

  6. Run the program with all nodes (both master and slaves) while instantiating JedisCluster. You don't face this issue in this scenario either.

  7. Run the program on the same host as that of 3 nodes (10.250.94.231), no issues faced here as well.

Redis / Jedis Configuration

Redis cluster configuration:

10.250.94.231:7002 master (0-5460) 10.196.22.224:7003 master (10923-16383) 10.248.88.85:7002 master (5461-10922)

10.248.88.85:7004 slave to 10.250.94.231:7002 10.248.88.85:7003 slave to 10.196.22.224:7003 10.250.94.231:7004 slave to 10.248.88.85:7002

Java program

          JedisCluster jedis = null;

	    try
	    {
	    	Set<HostAndPort> nodes = new HashSet<HostAndPort>();
	        nodes.add(new HostAndPort("10.250.94.231",7003));	        
        	/*
		 * nodes.add(new HostAndPort("10.250.94.231",7002)); 
		 * nodes.add(new HostAndPort("10.250.94.231",7004)); 
		 * nodes.add(new HostAndPort("10.248.88.85",7002)); 
		 * nodes.add(new HostAndPort("10.248.88.85",7003)); 
		 * nodes.add(new HostAndPort("10.248.88.85",7004));
		 */
	        jedis = new JedisCluster(nodes, 20000, 200);

	        for (int i = 0; i < 10; ++i) {	
	            String sKey = "/bootstrap/user/abc/1" + i;
	            jedis.set(sKey, "myvale");
	            System.out.println("Set Key = " + sKey);	            
	            String sVal = jedis.get(sKey);
	            System.out.println("Get Key = " + sKey + " val =" + sVal);
	        }
	    }
	    catch(Exception e)
	    {
	        e.printStackTrace();
	    }
	    finally
	    {
	        jedis.close();
	    }

Jedis version:

4.1.1

Redis version:

6.2.3

Java version:

java version "1.8.0_91"

Please let us know what could be the problem here and how to overcome it.

harsha-vardhana avatar Apr 05 '22 20:04 harsha-vardhana

@harsha-vardhana Following is your CLUSTER NODES response:

localhost:7003> CLUSTER NODES
8f84c6c3c809d1b3053fd536c212c89eec823d61 10.250.94.231:7002@17002 master - 0 1649178978000 1 connected 0-5460
63685c72b051dc781395fd3ca7aa7128d6b37e40 10.248.88.85:7004@17004 slave 8f84c6c3c809d1b3053fd536c212c89eec823d61 0 1649178978595 1 connected
b8fa4d6ad3ad1af40669a2da6427fe636519fce4 10.248.88.85:7003@17003 slave d1aa3c15420a444e3713d35eb4fe6470c6324d5d 0 1649178978000 2 connected
918b8f832107733027e84170110ecc06ac9cb795 10.250.94.231:7004@17004 slave 1f71c6568d5dbc2df43bdcdbb8dea51dae6149ee 0 1649178977190 4 connected
d1aa3c15420a444e3713d35eb4fe6470c6324d5d 10.196.22.224:7003@17003 myself,master - 0 1649178978000 2 connected 10923-16383
1f71c6568d5dbc2df43bdcdbb8dea51dae6149ee 10.248.88.85:7002@17002 master - 0 1649178978194 4 connected 5461-10922

Which you later edited to:

10.250.94.231:7002 master (0-5460)
10.196.22.224:7003 master (10923-16383)
10.248.88.85:7002 master (5461-10922)

10.248.88.85:7004 slave to 10.250.94.231:7002
10.248.88.85:7003 slave to 10.196.22.224:7003
10.250.94.231:7004 slave to 10.248.88.85:7002

To address your concerns:

Connect to host "10.250.94.231:7003" ...

10.250.94.231:7003 is not part of your described cluster. Use a node from the cluster.

Note: 10.250.94.231:7002 and 10.250.94.231:7004 both are part of the cluster but not ...:7003.

... connecting to another master node, "10.250.94.231:7003" ...

This is the same node as before. So the word "another" is not correct.

Also, that's not part of the described cluster.

... a different host, "10.248.88.85:7003", there is no delay and all the values are set in a jiffy!

Well, 10.248.88.85:7003 is part of the concerned cluster.


PS: Could 10.250.94.231:7003 be part of another cluster?

sazzad16 avatar Apr 06 '22 10:04 sazzad16

@sazzad16 That is another issue we are seeing which I should have mentioned. There seems to be a discrepancy in the IP addresses displayed when we run CLUSTER NODES. Say, I connect to cluster on host 10.250.94.231 (which actually is the host IP address) using port 7002 (./src/redis-cli -p 7002 -h localhost -c), then that node will appear as 10.196.22.224:7002. The rest of the nodes on that host will have 10.250.94.231 IP addresses. If I connect with this command ./src/redis-cli -p 7003 -h localhost -c and run CLUSTER NODES, then that node will appear as 10.196.22.224:7003 and the rest of the nodes will have 10.250.94.231 IP addresses. But if you run CLUSTER NODES command from another host (10.248.88.85), all nodes will be of either 10.250.94.231 or 10.248.88.85 IP addresses. I don't think this is an issue here which causing this problem here.

harsha-vardhana avatar Apr 06 '22 11:04 harsha-vardhana