node-amqp-connection-manager icon indicating copy to clipboard operation
node-amqp-connection-manager copied to clipboard

connection manager stop trying to reconnect after connection failed

Open elizatlawy opened this issue 2 years ago • 4 comments

We had an issue that we are currently unable to reproduce with a device that was under a low network, The device had 10 disconnection events that were followed up by a successful reconnection, but then on the 11 disconnect, even the device received connectFailed event and then did not try to reconnect anymore, does a connectFailed not lead to a reconnection attempt? is this is a know issue?

Version: 4.1.6 amqp version: 0.10.1

code of reconnect mechanism:

initRabbitMQConnection({ handleMessageFunction }) {
		try {
			this._handleFunction = handleMessageFunction;
			const connection = amqp.connect(RABBITMQ_CONNECTION_URI, {
				heartbeatIntervalInSeconds: 5,
				reconnectTimeInSeconds: 1
			});
			connection.on('connect', () => {
				logger.info(`Successfully connected`);
			});
			connection.on('connectFailed', ({ err }) => {
				logger.error(`Connection failed due to: ${err}`);
			});
			connection.on('disconnect', ({ err }) => {
				logger.error(`Connection disconnected due to: ${err}`);
			});
			connection.on('error', err => {
				logger.error(`Connection encountered an error: ${err}`);
			});
			connection.on('blocked', ({ reason }) => {
				logger.error(`Connection blocked due to: ${reason}`);
			});
			connection.on('unblocked', () => {
				logger.info(`Connection unblocked`);
			});
			this._channelWrapper = connection.createChannel({
				// The setup function will rerun on each reconnection
				setup: async channel => {
					// channel = the internal amqplib channel to be used for asserting/consuming.
					try {
						await channel.assertQueue(QUEUE.name, QUEUE.queueOptions);
						logger.info(`Successfully asserted queue: ${QUEUE.name}`);
						await this.bindQueueToTopics(channel);
						await this.attacheHandleFunctionToTopics(channel);
						logger.info(`Successfully finished channel setup`);
					} catch (err) {
						logger.error(`Error on Channel setup: ${err}`);
					}
				}
			});
			this._channelWrapper.on('error', (err, { name }) => {
				logger.error(`Channel: ${name} encounter an error: ${err}`);
			});
			this._channelWrapper.on('close', () => {
				logger.error(`Channel closed`);
			});
		} catch (err) {
			logger.error(`Error connecting to RabbitMQ: ${err}`);
		}
	}

Logs:

timestamp,message
1668697912766,"{""[2022-11-17 17:11:52] [INFO] [RabbitMQ]"":""Successfully connected""}"
1668697913417,"{""[2022-11-17 17:11:53] [INFO] [RabbitMQ]"":""Successfully asserted queue: Device-117""}"
1668697917795,"{""[2022-11-17 17:11:57] [INFO] [RabbitMQ]"":""Finished bound queue: Device-117 to all cloud topics""}"
1668697917987,"{""[2022-11-17 17:11:57] [INFO] [RabbitMQ]"":""Successfully attached handle function to all cloud topics""}"
1668697917988,"{""[2022-11-17 17:11:57] [INFO] [RabbitMQ]"":""Successfully finished channel setup""}"
1668698932888,"{""[2022-11-17 17:28:52] [ERROR] [RabbitMQ]"":""Connection disconnected due to: Error: Unexpected close""}"
1668698934997,"{""[2022-11-17 17:28:54] [INFO] [RabbitMQ]"":""Successfully connected""}"
1668698935562,"{""[2022-11-17 17:28:55] [INFO] [RabbitMQ]"":""Successfully asserted queue: Device-117""}"
1668698940237,"{""[2022-11-17 17:29:00] [INFO] [RabbitMQ]"":""Finished bound queue: Device-117 to all cloud topics""}"
1668698940434,"{""[2022-11-17 17:29:00] [INFO] [RabbitMQ]"":""Successfully attached handle function to all cloud topics""}"
1668698940435,"{""[2022-11-17 17:29:00] [INFO] [RabbitMQ]"":""Successfully finished channel setup""}"
1668720704114,"{""[2022-11-17 23:31:44] [ERROR] [RabbitMQ]"":""Connection disconnected due to: Error: Unexpected close""}"
1668720707560,"{""[2022-11-17 23:31:47] [INFO] [RabbitMQ]"":""Successfully connected""}"
1668720708831,"{""[2022-11-17 23:31:48] [INFO] [RabbitMQ]"":""Successfully asserted queue: Device-117""}"
1668720724196,"{""[2022-11-17 23:32:04] [INFO] [RabbitMQ]"":""Finished bound queue: Device-117 to all cloud topics""}"
1668720725747,"{""[2022-11-17 23:32:05] [INFO] [RabbitMQ]"":""Successfully attached handle function to all cloud topics""}"
1668720725748,"{""[2022-11-17 23:32:05] [INFO] [RabbitMQ]"":""Successfully finished channel setup""}"
1668724317773,"{""[2022-11-18 00:31:57] [ERROR] [RabbitMQ]"":""Connection disconnected due to: Error: Heartbeat timeout""}"
1668724330295,"{""[2022-11-18 00:32:10] [INFO] [RabbitMQ]"":""Successfully connected""}"
1668724330883,"{""[2022-11-18 00:32:10] [INFO] [RabbitMQ]"":""Successfully asserted queue: Device-117""}"
1668724335460,"{""[2022-11-18 00:32:15] [INFO] [RabbitMQ]"":""Finished bound queue: Device-117 to all cloud topics""}"
1668724335667,"{""[2022-11-18 00:32:15] [INFO] [RabbitMQ]"":""Successfully attached handle function to all cloud topics""}"
1668724335668,"{""[2022-11-18 00:32:15] [INFO] [RabbitMQ]"":""Successfully finished channel setup""}"
1668745906393,"{""[2022-11-18 06:31:46] [ERROR] [RabbitMQ]"":""Connection disconnected due to: Error: Heartbeat timeout""}"
1668745909950,"{""[2022-11-18 06:31:49] [INFO] [RabbitMQ]"":""Successfully connected""}"
1668745911166,"{""[2022-11-18 06:31:51] [INFO] [RabbitMQ]"":""Successfully asserted queue: Device-117""}"
1668745918988,"{""[2022-11-18 06:31:58] [INFO] [RabbitMQ]"":""Finished bound queue: Device-117 to all cloud topics""}"
1668745919188,"{""[2022-11-18 06:31:59] [INFO] [RabbitMQ]"":""Successfully attached handle function to all cloud topics""}"
1668745919189,"{""[2022-11-18 06:31:59] [INFO] [RabbitMQ]"":""Successfully finished channel setup""}"
1668762415727,"{""[2022-11-18 11:06:55] [ERROR] [RabbitMQ]"":""Connection disconnected due to: Error: Heartbeat timeout""}"
1668762452091,"{""[2022-11-18 11:07:32] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: Socket closed abruptly during opening handshake""}"
1668762504212,"{""[2022-11-18 11:08:24] [INFO] [RabbitMQ]"":""Successfully connected""}"
1668762504794,"{""[2022-11-18 11:08:24] [INFO] [RabbitMQ]"":""Successfully asserted queue: Device-117""}"
1668762509228,"{""[2022-11-18 11:08:29] [INFO] [RabbitMQ]"":""Finished bound queue: Device-117 to all cloud topics""}"
1668762509428,"{""[2022-11-18 11:08:29] [INFO] [RabbitMQ]"":""Successfully attached handle function to all cloud topics""}"
1668762509429,"{""[2022-11-18 11:08:29] [INFO] [RabbitMQ]"":""Successfully finished channel setup""}"
1668762534214,"{""[2022-11-18 11:08:54] [ERROR] [RabbitMQ]"":""Connection disconnected due to: Error: Heartbeat timeout""}"
1668762560243,"{""[2022-11-18 11:09:20] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: getaddrinfo EAI_AGAIN BROKER_URL""}"
1668762586266,"{""[2022-11-18 11:09:46] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: getaddrinfo EAI_AGAIN BROKER_URL""}"
1668762607296,"{""[2022-11-18 11:10:07] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: getaddrinfo EAI_AGAIN BROKER_URL""}"
1668762628316,"{""[2022-11-18 11:10:28] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: getaddrinfo EAI_AGAIN BROKER_URL""}"
1668762640590,"{""[2022-11-18 11:10:40] [INFO] [RabbitMQ]"":""Successfully connected""}"
1668762641187,"{""[2022-11-18 11:10:41] [INFO] [RabbitMQ]"":""Successfully asserted queue: Device-117""}"
1668762645947,"{""[2022-11-18 11:10:45] [INFO] [RabbitMQ]"":""Finished bound queue: Device-117 to all cloud topics""}"
1668762646147,"{""[2022-11-18 11:10:46] [INFO] [RabbitMQ]"":""Successfully attached handle function to all cloud topics""}"
1668762646148,"{""[2022-11-18 11:10:46] [INFO] [RabbitMQ]"":""Successfully finished channel setup""}"
1668762675591,"{""[2022-11-18 11:11:15] [ERROR] [RabbitMQ]"":""Connection disconnected due to: Error: Heartbeat timeout""}"
1668763258182,"{""[2022-11-18 11:20:58] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: read ECONNRESET""}"
1668763279204,"{""[2022-11-18 11:21:19] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: getaddrinfo EAI_AGAIN BROKER_URL""}"
1668763300226,"{""[2022-11-18 11:21:40] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: getaddrinfo EAI_AGAIN BROKER_URL""}"
1668763307545,"{""[2022-11-18 11:21:47] [INFO] [RabbitMQ]"":""Successfully connected""}"
1668763308142,"{""[2022-11-18 11:21:48] [INFO] [RabbitMQ]"":""Successfully asserted queue: Device-117""}"
1668763313052,"{""[2022-11-18 11:21:53] [INFO] [RabbitMQ]"":""Finished bound queue: Device-117 to all cloud topics""}"
1668763313270,"{""[2022-11-18 11:21:53] [INFO] [RabbitMQ]"":""Successfully attached handle function to all cloud topics""}"
1668763313270,"{""[2022-11-18 11:21:53] [INFO] [RabbitMQ]"":""Successfully finished channel setup""}"
1668763337549,"{""[2022-11-18 11:22:17] [ERROR] [RabbitMQ]"":""Connection disconnected due to: Error: Heartbeat timeout""}"
1668763358567,"{""[2022-11-18 11:22:38] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: getaddrinfo EAI_AGAIN BROKER_URL""}"
1668763413571,"{""[2022-11-18 11:23:33] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: getaddrinfo EAI_AGAIN BROKER_URL""}"
1668763430416,"{""[2022-11-18 11:23:50] [INFO] [RabbitMQ]"":""Successfully connected""}"
1668763445419,"{""[2022-11-18 11:24:05] [ERROR] [RabbitMQ]"":""Connection disconnected due to: Error: Heartbeat timeout""}"
1668763445420,"{""[2022-11-18 11:24:05] [ERROR] [RabbitMQ]"":""Channel: undefined encounter an error: Error: Channel ended, no reply will be forthcoming""}"
1668763535071,"{""[2022-11-18 11:25:35] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: read ECONNRESET""}"
1668763564869,"{""[2022-11-18 11:26:04] [INFO] [RabbitMQ]"":""Successfully connected""}"
1668763576954,"{""[2022-11-18 11:26:16] [ERROR] [RabbitMQ]"":""Connection disconnected due to: Error: read ECONNRESET""}"
1668763576955,"{""[2022-11-18 11:26:16] [ERROR] [RabbitMQ]"":""Channel: undefined encounter an error: Error: Channel ended, no reply will be forthcoming""}"
1668763589067,"{""[2022-11-18 11:26:29] [INFO] [RabbitMQ]"":""Successfully connected""}"
1668763589665,"{""[2022-11-18 11:26:29] [INFO] [RabbitMQ]"":""Successfully asserted queue: Device-117""}"
1668763594464,"{""[2022-11-18 11:26:34] [INFO] [RabbitMQ]"":""Finished bound queue: Device-117 to all cloud topics""}"
1668763594664,"{""[2022-11-18 11:26:34] [INFO] [RabbitMQ]"":""Successfully attached handle function to all cloud topics""}"
1668763594665,"{""[2022-11-18 11:26:34] [INFO] [RabbitMQ]"":""Successfully finished channel setup""}"
1668763624070,"{""[2022-11-18 11:27:04] [ERROR] [RabbitMQ]"":""Connection disconnected due to: Error: Heartbeat timeout""}"
1668763645092,"{""[2022-11-18 11:27:25] [ERROR] [RabbitMQ]"":""Connection failed due to: Error: getaddrinfo EAI_AGAIN BROKER_URL""}"

elizatlawy avatar Nov 23 '22 07:11 elizatlawy

Any news about the issue?

Omarosh avatar Mar 15 '23 14:03 Omarosh

does a connectFailed not lead to a reconnection attempt?

It should keep reconnecting until you ask it to stop. If you suspect this is because of some kind of ordering-of-connects-and-disconnects problem, I'd suggesting trying to add a unit test to this project that recreates it. If you can figure out a way to recreate this, I'd be interested in that too.

jwalton avatar Apr 04 '23 15:04 jwalton

does a connectFailed not lead to a reconnection attempt?

It should keep reconnecting until you ask it to stop. If you suspect this is because of some kind of ordering-of-connects-and-disconnects problem, I'd suggesting trying to add a unit test to this project that recreates it. If you can figure out a way to recreate this, I'd be interested in that too.

I will try to reproduce it with a unit test and update here if I am able to.

elizatlawy avatar Apr 09 '23 07:04 elizatlawy

For a while, we've been dealing with the same problem: getting stuck when trying to reconnect on an unreliable network. It seems that by setting a timeout value in the options like this: { connectionOptions: { timeout: 10000 } }, we've managed to resolve the issue. This solution was also mentioned in #14.

From what I understand now, the default node.js net socket doesn't have a finite timeout, and the amqp.connect promise can hang in certain situations on an unstable network. It may look like the library is not trying to reconnect.

In my opinion, it's best practice to always configure this timeout. I believe it would be beneficial to emphasize this point in the library documentation.

Any way, thanks for a great library! @jwalton

p0vidl0 avatar Apr 25 '24 04:04 p0vidl0