node-redis
node-redis copied to clipboard
It will never reconnect when connection has closed
Hi, in the past few years, I find a critical bug in my production env. My production env use node-redis : 2.7.1. When the network disconnect , the node-redis will run the retry_strategy that I specified,and then the node-redis connection closed.But node-redis will not reconnect to the redis server again when the network is ok. So my production env always throw an error: The connection is already closed in the error log, always. Then I have to restart my node server to fix this. In my production env , the retry_strategy is:
retry_strategy: function (options) {
// console.log('retry_strategy', options);
if (options.error && options.error.code === 'ECONNREFUSED') {
// End reconnecting on a specific error and flush all commands with
// a individual error
return new Error('The server refused the connection');
}
if (options.total_retry_time > 1000 * 60 * 60) {
// End reconnecting after a specific timeout and flush all commands
// with a individual error
return new Error('Retry time exhausted');
}
if (options.attempt > 10) {
// End reconnecting with built in error
return new Error('Attempt time exhausted');
}
// reconnect after
return Math.min(options.attempt * 100, 3000);
}
I doubt a very long time, why this would happended. So I dig into the source codes , the node-redis version is 2.8.0. It has this bug too. Then I find the reason:
function handle_offline_command (self, command_obj) {
var command = command_obj.command;
var err, msg;
if (self.closing || !self.enable_offline_queue) {
command = command.toUpperCase();
if (!self.closing) {
if (self.stream.writable) {
msg = 'The connection is not yet established and the offline queue is deactivated.';
} else {
msg = 'Stream not writeable.';
}
} else {
msg = 'The connection is already closed.';
}
err = new errorClasses.AbortError({
message: command + " can't be processed. " + msg,
code: 'NR_CLOSED',
command: command
});
if (command_obj.args.length) {
err.args = command_obj.args;
}
utils.reply_in_order(self, command_obj.callback, err);
} else {
debug('Queueing ' + command + ' for next server connection.');
self.offline_queue.push(command_obj);
}
self.should_buffer = true;
}
When the connection is down, then the ready is false and stream is not writable. And handle_offline_command will handle this .At this time the closing is always true, so it will throw an AbortError. So I donot know why retry connection to de redis server in this scope.
I fix this code like this, and it works:
function handle_offline_command (self, command_obj) {
var command = command_obj.command;
var err, msg;
if (self.closing || !self.enable_offline_queue) {
if(self.stream == null || self.stream.destroyed){
retry_connection(self, null);
}else{
self.offline_queue.push(command_obj);
}
// command = command.toUpperCase();
// if (!self.closing) {
// if (self.stream.writable) {
// msg = 'The connection is not yet established and the offline queue is deactivated.';
// } else {
// msg = 'Stream not writeable.';
// }
// } else {
// msg = 'The connection is already closed.';
// }
// err = new errorClasses.AbortError({
// message: command + " can't be processed. " + msg,
// code: 'NR_CLOSED',
// command: command
// });
// if (command_obj.args.length) {
// err.args = command_obj.args;
// }
// utils.reply_in_order(self, command_obj.callback, err);
} else {
debug('Queueing ' + command + ' for next server connection.');
self.offline_queue.push(command_obj);
}
self.should_buffer = true;
}
I donot know it will has any other logic code relatively, although I fix it and it work well . So I hope you give me a reliable solution, thks.The code is a little bit messy, so I very want to rewrite your node-redis using asyn await. But I have no time in this year.
I've run into exactly this today with the same standard retry_strategy as shown above. In trying to understand what was going on, it struck me that this code doesn't seem to make a lot of sense:
if (options.error && options.error.code === 'ECONNREFUSED') {
// End reconnecting on a specific error and flush all commands with an individual error
return new Error('The server refused the connection');
}
Isn't that pretty much exactly the circumstance when you need the retries to continue (e.g. temporary server disconnect)?
Commenting the above lines out at least seems to let my retry strategy keep retrying. Are there other negative impacts of making this change?
I'm trying to understand this also... As per the docs:
If you return a number from this function, the retry will happen exactly after that time in milliseconds. If you return a non-number, no further retry will happen and all offline commands are flushed with errors.
Which means that a retry_strategy like this:
if (options.error && options.error.code === 'ECONNREFUSED') {
return new Error('The server refused the connection');
}
will make the client never retry again. It must return a number value.
Now, what even gets worse is that in the example
var client = redis.createClient({
retry_strategy: function (options) {
if (options.error && options.error.code === 'ECONNREFUSED') {
return new Error('The server refused the connection');
}
if (options.total_retry_time > 1000 * 60 * 60) {
return new Error('Retry time exhausted');
}
if (options.attempt > 10) {
return undefined;
}
return Math.min(options.attempt * 100, 3000);
}
});
the 2nd and 3rd if statements will never be executed in case of failure of type ECONNREFUSED and re-connection won't stop after 1000 * 60 * 60ms or after 10 attempts!
After changing the order:
retry_strategy = (options) => {
if (options.total_retry_time > 1000 * 60 * 60) {
return new Error('Retry time exhausted');
}
if (options.attempt > 5) {
return new Error('Max attempts reached');;
}
if (options.error && options.error.code === 'ECONNREFUSED') {
return 500;
}
return Math.min(options.attempt * 100, 3000);
}
It should work as intended. I guess.
Returning a new Error() won't emit any event log... Re-connection will stop without logging any error message.
@eladbs nice investigation there, would you be willing to send over a PR to correct the example in the docs? Thank you
I'm interested in what will happen to this moving forward in v3+, as connect_timeout
is deprecated.
To get the library to keep retrying forever, I needed to combine a simple retry strategy with the connect_timeout
option, e.g
const retryStrategy = (options) => {
// retry forever, with a max retry delay of 5s
return Math.min(options.attempt * 100, 5000);
};
const client = redis.createClient(connectionString, {
enable_offline_queue: false,
retry_strategy: retryStrategy,
connect_timeout: Number.MAX_SAFE_INTEGER, // (the default is 1 hour)
});
Otherwise, this still happens after one hour, regardless of the retry stategy.
https://github.com/NodeRedis/node-redis/blob/da31ade348f9b98ebc5a3164d59a93576a865906/index.js#L579
Hi
As mentioned by @deancouch, if you look in the code, connect_timeout (it is not deprecated as far as I can see) is set to 1hr by default:
this.connect_timeout = +options.connect_timeout || 3600000; // 60 * 60 * 1000 ms
Therefore, regardless of what workaround you use in your code tweak or retry strategy above, the library will throw an error and end the connection after this timeout. Unless you specific an even bigger number (like @deancouch's example above) to prevent that from happening, you could handle the condition.
For more thorough error handling, you can catch, recreate and throw on redis errors, as per snippet below.
public async checkRecreateRedisOnError(err: any) {
const isRedisError = (err instanceof RedisError) ? true : false;
const isAbortError = (err.constructor.name === "AbortError") ? true : false;
const isAggregateError = (err.constructor.name === "AggregateError") ? true : false;
const errorCode = err.code;
if ((isRedisError || isAbortError || isAggregateError) && errorCode) {
await this.checkRecreateRedis();
// we rethrow the error after reconnecting, so upper level can perform a retry
throw err;
}
}
checkRecreateRedis()
is shown in the code snippet below, which can also be called when not in an error condition, eg you have a watchdog loop which checks the health of the redis connection:
public async checkRecreateRedis() {
if (this.redisClient.connected) {
return;
}
while (!this.redisClient.connected) {
if (this.redisClient.closing) {
try {
this.redisClient = await new RedisClient(this.options);
} catch (err) {
console.log(`Failed to recreate redis client: [${err.message}]`);
}
} else {
await this.sleep(1000);
}
}
}
Please note that the above methods have been simplified to enhance readability. It also assumes you want to block on error in your code, so if your app cannot be blocked (eg you only want to try a redis read/write and log on error), please do not be use them. In this non-blocking case, you may just want to have a watchdog loop which calls checkRecreateRedis()
to ensure the redis client is recreated after it has ended.
if (options.attempt > 10) {
// End reconnecting with built in error
return new Error('Attempt time exhausted');
}
You can change the
return new Error('Attempt time exhausted');
to
return 1000;
It means retry_delay, then the client will retry without stopping