node-amqp10 Error can

I'm consistently getting errors related to accept()'ing messages. There are no issues receiving and processing the messages but when the system goes to accept() it seems the connection has gone away and errors like the one quoted below occur. (Note that sometimes it is a different property like sendFrame). There will be a spurt of these (5-20) and the connection seems to rehabilitate itself and things get back to normal. Once this starts happening though, the situation degrades until errors are far more prevalent than successes. Stopping and restarting the service clears it up for a while (20 min?) and then the cycle starts again.

We are using 3.4.0 and poking around in the issues it may be relevant to mention that we are using multiple clients (e.g., new AmqpClient(AmqpPolicy.ServiceBusQueue, policy);) If you think it relevant, we can switch to one as they are all using the same policy but we do have 5 queues to process.

TypeError: Cannot read property 'write' of null
    at Object.frames.writeFrame (c:\git\ospo-ghcrawler\node_modules\amqp10\lib\frames.js:56:9)
    at Connection.sendFrame (c:\git\ospo-ghcrawler\node_modules\amqp10\lib\connection.js:328:10)
    at ReceiverLink._sendDisposition (c:\git\ospo-ghcrawler\node_modules\amqp10\lib\receiver_link.js:152:27)
    at ReceiverLink.settle (c:\git\ospo-ghcrawler\node_modules\amqp10\lib\receiver_link.js:124:8)
    at ReceiverLink.accept (c:\git\ospo-ghcrawler\node_modules\amqp10\lib\receiver_link.js:73:8)
    at Amqp10Queue.done (c:\git\ospo-ghcrawler\lib\amqp10Queue.js:102:21)

Jan 15 '17 08:01 jeffmcaffer

AMQP 1.0 itself is meant to be multiplexed over single clients, however that doesn't prohibit you from using multiple clients. Also, there appears to be anecdotal evidence that we get some performance degradation with ServiceBus and the use of a single client (see this issue). My general recommendation is that you should use a single client.

What you are describing sounds like the remote side of the connection is periodically closing the link, and the default reconnect/reattach policy is kicking in. Without more code to look at, this will be difficult for me to solve. Could you provide more of a snippet here? Also, similar to my suggestion in your other issue, if you could run this code in debug mode and verify that the remote end is indeed acting funny that would be useful information as well.

Jan 15 '17 20:01 mbroadst

@mbroadst Thanks for the quick reply! Feel free to check out the code of our queue receiver. We create 5 of those in ospoCrawler.js. Could just as well switch to one AmqpClient though I'm not sure if that will affect the server disconnecting.

I agree that is appears the reconnect is kicking in and things do seem to work but it appears to happen more frequently when it starts and then falls apart.

The other thing to mention here is that we have many "concurrent" workers. There will be anywhere up to 40 active "loops" where each is calling pop() on the queue, processing, potentially adding more to the queue and then going back to the beginning.

It will be a bit hard for you to run in isolation. I'll see about running it in debug mode though we generally process thousands of messages before the problem happens...

Jan 16 '17 00:01 jeffmcaffer

@jeffmcaffer looks like the repo is still private

Jan 16 '17 01:01 mbroadst

D'oh. Public now.

Jan 16 '17 03:01 jeffmcaffer

@jeffmcaffer ~~there looks to be some weirdness with your policy code. Initially policy is defined to be a function here and then subsequently changed to an object using Policy.merge here. I'm not actually sure what that would do, bit it's almost guaranteed to not be what you intend~~

Oops, I didn't see that was RenewOnSettle which does indeed return a policy object (grr that's confusing)

Jan 16 '17 13:01 mbroadst