node-amqp10 icon indicating copy to clipboard operation
node-amqp10 copied to clipboard

Slow performance on single node with 4 receivers

Open ajukraine opened this issue 9 years ago • 19 comments

I have Azure event hub with 4 partitions (and 4 TPU) with 2500 events per each partition.

I've executed script in single Node.js instance, which creates 4 receivers. Total execution time is 3m2.943s After that I've executed simultaneously 4 instances, each had 1 receiver. The average execution time is 56s.

Also I have C# .NET application, which creates 4 receivers (each in separate thread). Total execution time is 58s.

Question: Why single Node.js instance cannot meet the execution time of the multi-threaded C# .NET application? Is it possible to implement non-blocking reading from multiple partitions in Node.js?

ajukraine avatar Dec 21 '16 15:12 ajukraine

@ajukraine can you please provide a minimal working example of this so we can test locally? Or perhaps just provide snippets from your existing case?

The performance of the module in my experience is pretty great, and its definitely not blocking, so I suspect there might be some specific settings omitted between the test cases/implementations.

mbroadst avatar Dec 21 '16 16:12 mbroadst

https://gist.github.com/ajukraine/50393bfc255bb73c8d4a8f4bfcfde5f8

ajukraine avatar Dec 21 '16 16:12 ajukraine

Each event is 12KB Json string

ajukraine avatar Dec 21 '16 16:12 ajukraine

@ajukraine is it possible at all to get a sample of what the payload looks like? just to make sure its not related to a particular decoding issue

mbroadst avatar Dec 21 '16 16:12 mbroadst

Also, I noticed that in your example you are using the azure-event-hubs npm, could you possibly modify one of our EventHub examples to attempt this process with just the amqp10 driver?

mbroadst avatar Dec 21 '16 16:12 mbroadst

@mbroadst here's the example of the payload https://gist.github.com/ajukraine/6c5dff3df2659c585c3eece3d1d4330c

Yes, I've tried to use amqp10 directly - the performance was quite same (that's why I tried the Azure one, but under the hood it uses amqp10 with modified policy).

I will perform test again with amqp10 driver, and will share results.

ajukraine avatar Dec 21 '16 16:12 ajukraine

Here's the script with amqp10: https://gist.github.com/ajukraine/c9c1078a83d8f77ed8f8cb1cf23fbd65

Total execution time is 2m58.447s, which almost the same as with Azure's wrapper library.

After that I've executed simultaneously 4 instances, each had 1 receiver. The average execution time is 52s.

ajukraine avatar Dec 21 '16 16:12 ajukraine

Is it possible that client's IO threads are blocking each other? I will try to create 4 separate clients then.

ajukraine avatar Dec 21 '16 17:12 ajukraine

It seems to work, if I create separate client for each partition (and single receiver per client). Total execution time is 0m57.465s, which is a bit higher than for 4 separate processes, but for sure much simpler solution.

ajukraine avatar Dec 21 '16 17:12 ajukraine

So is it a bug or expected behavior?

ajukraine avatar Dec 21 '16 17:12 ajukraine

Another thing is that with 4 clients I get messages in random order:

received(2) received(1) received(0) received(3) received(2) received(1) received(3) received(2) received(0) received(1) received(3) received(2) received(1) received(0) received(3) received(2) received(1) received(0) received(2) received(1) received(0) received(2) received(1) received(0) received(3) received(2) received(1)

And with single client I get them in batches (exactly 100 messages per partition):

100 times received(0) ... 100 times received(3) ... 100 times received(1) ... 100 times received(3) ...

Btw the partition '2' was not prioritized in the same way as others. However sometimes I do get sequence of random partitions, but mostly these are batches from same partition.

ajukraine avatar Dec 21 '16 17:12 ajukraine

I have created virtual machine in the same Azure data center, where EventHub is hosted.

C# .NET solution - 8.037s Node.Js amqp10 (4 clients) - 10.406s Node.Js amqp10 (1 client) - 10.617s

So it seems, that problem is also caused by network (maybe latency). But it looks strange, that number of clients are making the difference in slower network. Is it ok to scale clients instead of receivers? Is there any difference in terms of amqp10 protocol?

ajukraine avatar Dec 21 '16 18:12 ajukraine

@ajukraine its definitely not the expected behavior that having separate clients yields faster results than multiplexing a single client. In fact using a single client is the intended approach, so I'll spend some time today trying to figure out what's going on here - your feedback is very much appreciated. Just one last clarifying question regarding the payload example you sent: are you actually sending the object as an object? Or are you sending that JSON object as a string/buffer

mbroadst avatar Dec 22 '16 13:12 mbroadst

I've sent it as bytes array using C# .NET api. So my guess it's sent using binary format.

ajukraine avatar Dec 22 '16 13:12 ajukraine

@ajukraine you should be able to check the type of the message.body in your received handler

mbroadst avatar Dec 22 '16 13:12 mbroadst

@ajukraine I'm able to confirm this locally, researching the cause of the bug at the moment.

mbroadst avatar Dec 22 '16 15:12 mbroadst

Hm, it might be tricky to find a cause, I suspect

ajukraine avatar Dec 22 '16 19:12 ajukraine

@ajukraine so I've put together a similar test using our CI's EventHub setup, running this code: https://gist.github.com/mbroadst/6add091dea7ebea97236b0fc0a5b98da

the results for both tests are as follows:

mbroadst@retinoid:node-amqp10 (master=)$ node 284-eh.js single
single client
benchmark: 12042.129ms
mbroadst@retinoid:node-amqp10 (master=)$ node 284-eh.js single
single client
benchmark: 12753.704ms
mbroadst@retinoid:node-amqp10 (master=)$ node 284-eh.js multi
multiple clients
benchmark: 6805.755ms
mbroadst@retinoid:node-amqp10 (master=)$ node 284-eh.js multi
multiple clients
benchmark: 7043.181ms

Then I set up a similar test against a local qpidd, where I created 4 queues (test-queue0-3), and I ran a test using attach option distributionMode: 'copy', which should not actually dequeue the messages - similar to an EventHub, and each queue loaded with 2500 messages. Here is that code: https://gist.github.com/mbroadst/b1c155950ff34af9cd58a54fb2360858

And the results:

mbroadst@retinoid:node-amqp10 (master=)$ node 284-qpidd.js single
single client
benchmark: 832.666ms
mbroadst@retinoid:node-amqp10 (master=)$ node 284-qpidd.js single
single client
benchmark: 872.058ms
mbroadst@retinoid:node-amqp10 (master=)$ node 284-qpidd.js multi
multiple clients
benchmark: 866.356ms
mbroadst@retinoid:node-amqp10 (master=)$ node 284-qpidd.js multi
multiple clients
benchmark: 865.006ms

As you can see with the qpidd tests I'm basically getting the same performance whether I use multiple clients or not. This would seem to indicate two possibilities to me:

  • service bus is set up remotely to optimize the approach of creating multiple clients (perhaps rate limiting individual clients)
  • there are some local client policy settings prohibiting the same speed with a single client. If that's the case then it would most likely be related to session flow, perhaps we are running out of credits too quickly or the incoming/outgoing windows for the link are not optimal

I did do some tests tuning the policy "knobs" to increase the values for frame windows to rather large values and it didn't seem to change the speed very much... so I'm thinking option #1 is most likely what's happening.

Can you confirm that when you run a similar code snippet using AMQP.Net Lite you are getting the same performance for both a single client and multiple clients? If thats not the case, I think we can bump this up the chain to Microsoft. Otherwise, I'll keep digging.

mbroadst avatar Dec 22 '16 23:12 mbroadst

I've changed the C# .NET implementation to use 1 client per partition. Results remain same. Total execution time about 7.8 seconds.

And here is timings per partition. It's always like that - 1st created partition finished faster, rest of them are throttled (according to EventHub pricing/limitation).

Receiver 0, total 2728 ms Receiver 2, total 5027 ms Receiver 1, total 6285 ms Receiver 3, total 7426 ms

ajukraine avatar Dec 23 '16 08:12 ajukraine