zeromq.node icon indicating copy to clipboard operation
zeromq.node copied to clipboard

Performance compare with Python

Open jmparraguez-haulmer opened this issue 11 years ago • 21 comments

I developed a simple test with Node.JS and Python (pymq) attached image and repository with the code and if there might be a bug in the implementation.

To my surprise Node.JS performance is very poor compared to Python

Any suggestions as to why this is happening?

two-queues-zmq

https://github.com/jmparra/two-queues/tree/zeroqm-nodejs

note: one client = one core.

jmparraguez-haulmer avatar Nov 19 '14 21:11 jmparraguez-haulmer

During this test, how much CPU do these 2 processes consume? I wonder if Node is mostly idle or not. If not, it would be great to see some flame graphs.

ronkorving avatar Nov 20 '14 01:11 ronkorving

I wonder if we ported the flush method to pure C++, how much we would gain. Inside that function we jump between native and JS land a lot. Those jumps are costly. And this function is where we really spend all of our time.

ronkorving avatar Nov 20 '14 01:11 ronkorving

i enjoy looking at throughput. It is a nice average with substantial differences across the size of our message envelopes.

reqshark avatar Nov 20 '14 02:11 reqshark

I ran the test again, CPU usage is 20-70% in both cases, take advantage of drawing a graph flame to stack (with NodeJS v11.14): http://jsfiddle.net/6k0p2yry/

@ronkorving you're absolutely right, the flush method spends a lot time...

jmparraguez-haulmer avatar Nov 20 '14 05:11 jmparraguez-haulmer

It would be a great experiment to port _flush to C++. If anyone's up for it..? :)

ronkorving avatar Nov 20 '14 07:11 ronkorving

@jmparra, I think we can also explain why this is happening if we look more directly at Python.

First of all, python is a much slower language than Node.js. But, similar to Lua, Python can define and use C/C++ library ABIs in a native way, providing direct function access without an additional binary component. I think officially it's a Foreign Function Interface

Python does this when you import ctypes, which I believe, given a small enough codebase, may completely bypass latencies experienced across other areas of the running interpreter. I've found this to be true, here for example: https://github.com/reqshark/sublimezmq/blob/master/zmq.py

reqshark avatar Nov 20 '14 17:11 reqshark

Hence my suggestion for porting flush. It would do away with most of the bad parts of JS/native bindings.

ronkorving avatar Nov 21 '14 00:11 ronkorving

+1, there's a lot of context switching there between JS and the binding

reqshark avatar Nov 21 '14 14:11 reqshark

also if we rewrote it as poll and select instead it could really improve perf there http://linux.die.net/man/2/select

reqshark avatar Nov 21 '14 14:11 reqshark

libuv (node.js internals) can accomplish this with the uv_poll_t handle http://docs.libuv.org/en/latest/poll.html

but we would have to include the uv.h header, which shouldn't be an issue since we're running node anyway...

reqshark avatar Nov 21 '14 15:11 reqshark

+1 for libuv, i think is better choice because focus on asynchronous I/O, but I don't know how much improvement provides (if it offers better performance than select)

jmparraguez-haulmer avatar Nov 21 '14 18:11 jmparraguez-haulmer

On my computer I get: Python: 50k requests/sec Node: 10k requests/sec

I think that the test should be fixed a little bit Here you use a setImmediate to push messages but I suggest a simple while(1)

Doing this change on my computer I get Nodejs: 20k requests/sec

However Python is still faster so we have to deal with it.

@ronkorving is right. We should move _flush to C++ If we move the flush loop and the emitter to C++ the perf benefits could be very interesting. Can we use a solution like this one ?

prdn avatar Nov 30 '14 11:11 prdn

Do we even need a native event emitter? There are only 2 callbacks that need calling. One for passing along errors, and one for passing along received messages. Those could be fixed, which in turn emit, right? (I'm not expert at all in native Node land however, so maybe I'm taking nonsense?)

ronkorving avatar Dec 01 '14 01:12 ronkorving

I see that the _flush method emits an event for received messages. So moving the _flush function to C++ we should also use a C++ event emitter. Am I missing some point?

I think also that we should split _flush in two methods: _send and _recv The send method should call only send. I'm performing a couple of tests and there should be an interesting performance increase. By the way if we move that code to C++ should be enough.

prdn avatar Dec 01 '14 08:12 prdn

Thanks for the ongoing work. I'm really looking forward to the results :)

ronkorving avatar Dec 01 '14 09:12 ronkorving

Thanks all for working on this issue. What is the state of the current solution for this issue ?

sudhanshudahiya avatar Nov 18 '15 21:11 sudhanshudahiya

+1 for the update on this issue

chiihuang avatar Dec 04 '15 15:12 chiihuang

To be honest, no work has been done on moving _flush to native land. However, I still strongly believe it's the right thing to do.

ronkorving avatar Dec 04 '15 16:12 ronkorving

I wonder if it's worth rethinking flush a bit and just keeping the recv and send completely separate.

I feel like the complexity got multiplied by coupling these operations through flush

reqshark avatar Dec 04 '15 16:12 reqshark

oh wait @prdn said that like a year ago

I think also that we should split _flush in two methods: _send and _recv The send method should call only send.

reqshark avatar Dec 04 '15 18:12 reqshark

@reqshark @ronkorving please give a try to this PR https://github.com/JustinTulloss/zeromq.node/pull/492

prdn avatar Feb 05 '16 11:02 prdn