node-redis Error: Socket closed unexpectedly & ECONNREFUSED

Hi! I'm working with a Redis Cluster with four masters and four slaves distributed in two different hosts, with the following configuration:

Host 1: Master in port 6000 (nodes 0 to 4095); Master in port 6001 (nodes 8192 to 12287); Slave in port 6002 (master in host 2, port 6001); Slave in port 6003 (master in host 2, port 6000);

Host 2: Master in port 6000 (nodes 4096 to 8191); Master in port 6001 (nodes 12288 to 16383); Slave in port 6002 (master in host 1, port 6001); Slave in port 6003 (master in host 1, port 6000);

All is set that in case a master goes down, the corresponding slave takes its place. In Redis this works fine, but in node-redis once a master goes down it keeps trying to reconnect to the closed socket and ignores the new master, wich keeps printing in console the same error: connect ECONNRFUSED XXX.XXX.XXX.XXX:6000

I have the following configuration for my cluster in node-redis:

try {
        (async()=>{
            let path = [];
            for(let i in self.config.get('redis').cluster){
                let item = self.config.get('redis').cluster[i];
                path.push({
                    'url': 'redis://'+ item.host +':' + item.port
                });
            }
            self.redis = Redis.createCluster({
                rootNodes: path,
                defaults: {
                    socket: {
                        reconnectStrategy: function (times) {
                        var delay = Math.min(100 + times * 2, 2000);
                        return delay;
                        }
                    }
                },
                maxCommandRedirections: 16, 
            });

            self.redis.on('error', function(error) {
                self.logger.error("Error: " + error);
            });

            await self.redis.connect();

        })();
    } catch (ex) {
        self.logger.error("Couldn't connect to Redis Cluster ", ex);
        throw ex;
    }

Is there anything I need to change in my configuration to stop the error from appearing and connect to the new master? Thanks in advance for the help.

Environment:

Node.js Version: v16.14.0
Redis Server Version: 6.2.6
Node Redis Version: 4.1.0
Platform: Alpine Linux v3.15

May 19 '22 20:05 rriverosguerrero

I don't think it is a problem with master and slave. this has been a long-running problem where no one knows why this happens. They just say that Redis will reconnect automatically. But I think this is very annoying.

Jul 04 '22 04:07 KunalBurangi

I'm having the same issues, what is worse there is no good way to handle the exception, the reconnect stuff just plain doesn't work. Until its fixed, there is no way to use node-redis in any production capacity.

Jul 07 '22 20:07 jleppert

It took me several hours to reach here, only to know it's not fixable. I'm trying to maintain connections to multiple independent Redis instances and one connection failure takes over all the processing and it goes into the following loop:

127.0.0.1:6379 : redis client reconnecting
127.0.0.1:6379 : redis client error Error: connect ECONNREFUSED 127.0.0.1:6379

Does the below config even work? reconnectStrategy: retries => false

May 12 '23 23:05 deegoy2

The reconnection of createCluster has an issue, but there is a simple solution to fix it.

const redis = require('redis');

const client = redis.createCluster({
  rootNodes: [
    {
      url: 'redis://127.0.0.1:7001',
      // Fix overridden default socket options
      socket: {},
    },
  ],
  defaults: {
    socket: {
      connectTimeout: 10000,
      reconnectStrategy: (/* retries, cause */) => {
        return 5000;
      },
    },
  },
});

await client.connect();

Jun 07 '23 19:06 AnechaS

I cannot reproduce this failure.

node-redis 4.6.6 Node.js 16.4.2 Redis 7.0.11

My code:

'use strict';

const redis = require('redis4');
const log = require('log4js').getLogger(); log.level = 'debug';

const client = redis.createCluster({
  rootNodes: [
    {
      url: 'redis://127.0.0.1:7010',
      // Fix overridden default socket options
      // socket: {},
    },
  ],
  defaults: {
    socket: {
      connectTimeout: 10000,
      reconnectStrategy: (/* retries, cause */) => {
        return 5000;
      },
    },
  },
});

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

async function main() {
  client.on('error', e => log.error('Redis cliente error: %s', e.message));
  await client.connect();
  while (true) {
    try {
      const foo = await client.type('foo');
      log.info("type of foo: %o", foo);
    } catch (e) {
      log.error("error: %s", e.message);
    }
    await sleep(5000);
  }
}

main();

Output:

[2023-06-08T13:34:24.711] [INFO] default - type of foo: 'hash'
[2023-06-08T13:34:29.716] [INFO] default - type of foo: 'hash'
[2023-06-08T13:34:33.998] [ERROR] default - Redis cliente error: Socket closed unexpectedly
[2023-06-08T13:34:33.999] [ERROR] default - Redis cliente error: Socket closed unexpectedly
[2023-06-08T13:34:34.717] [INFO] default - type of foo: 'hash'
[2023-06-08T13:34:39.723] [INFO] default - type of foo: 'hash'
[2023-06-08T13:34:44.725] [ERROR] default - error: CLUSTERDOWN The cluster is down
[2023-06-08T13:34:49.730] [ERROR] default - error: CLUSTERDOWN The cluster is down
[2023-06-08T13:34:54.737] [ERROR] default - error: CLUSTERDOWN The cluster is down
[2023-06-08T13:34:59.741] [ERROR] default - error: CLUSTERDOWN The cluster is down
[2023-06-08T13:35:04.745] [ERROR] default - error: CLUSTERDOWN The cluster is down
[2023-06-08T13:35:09.747] [INFO] default - type of foo: 'hash'

It's a bit weird that there are two successful Redis calls after the socket error (that's when I killed one of the master nodes), but you can see that it reconnected just fine once I brought that node back up.

Jun 08 '23 13:06 dtikhonov

I can only reproduce this inside k8s (update: I was also able to reproduce this in docker)

I'm running a redis-cluster with bitnami

helm install test oci://registry-1.docker.io/bitnamicharts/redis-cluster

I run the following + code from above in a pod:

const client = redis.createCluster({
  rootNodes: [
    {
      url: "redis://test-redis-cluster-0.test-redis-cluster-headless:6379",
    },
    {
      url: "redis://test-redis-cluster-1.test-redis-cluster-headless:6379",
    },
    {
      url: "redis://test-redis-cluster-2.test-redis-cluster-headless:6379",
    },
    {
      url: "redis://test-redis-cluster-3.test-redis-cluster-headless:6379",
    },
    {
      url: "redis://test-redis-cluster-4.test-redis-cluster-headless:6379",
    },
    {
      url: "redis://test-redis-cluster-5.test-redis-cluster-headless:6379",
    },
  ],
  defaults: {
    password: "XXXXXXXX",
    socket: {
      connectTimeout: 10000,
      reconnectStrategy: (/* retries, cause */) => {
        return 5000;
      },
    },
  },
});

After killing a master redis pod that the client was connected to

Output:

[2024-02-06T10:32:51.889] [INFO] default - type of foo: 'none'
[2024-02-06T10:32:56.897] [INFO] default - type of foo: 'none'
[2024-02-06T10:32:58.610] [ERROR] default - Redis cliente error: Socket closed unexpectedly
[2024-02-06T10:32:58.610] [ERROR] default - Redis cliente error: Socket closed unexpectedly
[2024-02-06T10:33:01.903] [ERROR] default - error: The client is closed
[2024-02-06T10:33:06.907] [ERROR] default - error: The client is closed
[2024-02-06T10:33:11.913] [ERROR] default - error: The client is closed
[2024-02-06T10:33:16.920] [ERROR] default - error: The client is closed
[2024-02-06T10:33:21.922] [ERROR] default - error: The client is closed
[2024-02-06T10:33:26.925] [ERROR] default - error: The client is closed
[2024-02-06T10:33:31.926] [ERROR] default - error: The client is closed
...

This loops forever until I restart the pod (the only way to fix this)

When I kill a redis pod a new pod spins up with a new IP address. I'm assuming node-redis doesn't like it when a node IP address is unreachable?

This makes it impossible to run this in production in k8s.

Feb 06 '24 10:02 scorpionknifes

I've drafted out a fix https://github.com/redis/node-redis/pull/2701 for my use case. I've written it with my limited abilities, appreciate it if someone can rewrite it. 🙏

Feb 08 '24 08:02 scorpionknifes