zeromq.node
zeromq.node copied to clipboard
Only receiving first 1000 messages on "sub" socket when using zeromq 3.2.X
I have a project that includes a stress test. The test tries to send 2500 messages through the system and times how long it takes.
The test completes with Node v0.8/v.10 and zmq module 2.3.0 and zeromq 2.2.x. With Node v0.8/v0.10 and zmq module 2.3.0 and zeromq 3.2.x the test never completes.
I have debugged the issue down to the last step which is a "sub" socket with 2500 subscription filters. Only the first 1000 (exactly, everytime) messages are being received and the rest are being dropped.
I have tuned my system (Ubuntu 12.04) to allow 40,000 files per process, as per the tuning guide: http://www.zeromq.org/docs:tuning-zeromq
Any ideas?
Hello. Could you share the code you're benchmarking with?
The code I am benchmarking with is my entire application. It can be found here: https://github.com/rehanift/engine.js
There is a description of how to run the tests at the bottom of the README.
I can share my debugging steps if you like, but it involves manually tweaking the build to place some console.log
statements to get all the information. If you are interested I can write up the steps tonight.
Ok I see. I've installed the package and have run tests. I've added a console.log into the
waitsFor(function(){
console.log(done);
return done == tasks_per_client * num_clients;
},100000);
And I also see it stops processign after 1000 tasks. It would be great if you are able to cut down this into something more simple like I did in #195 because I completely don't understand what's going on in the engine.js :)
My first debugging experiment was to create a "sub" socket with 2500 subscription filters and then publish 2500 messages, one for each filter. That test past, but unfortunately I threw away my code. I'll recreate it and post it here.
Are there other debugging experiments you think might be useful?
Right now we can't introspect a "sub" socket's subscribed filters. I think we need #161 to get more data.
Check your HighWaterMark. In ZMQ 3 the default value is 1000. That's why the rest is dropped. You may set it to 0 (no limits) if you like. Check it at http://api.zeromq.org/3-2:zmq-setsockopt
Ok, after about a year of off-and-on testing I finally narrowed down the issue :-)
It appears that the sub
socket stops receiving messages due to the length/content of the set filters.
In https://gist.github.com/rehanift/edf6beff3a149ffe4e0a#file-failure-js I am creating 2500 filters using the first segment of randomly generated UUIDs. I send 2500 messages (one to each filter) and only 1000 messages are actually received (as originally reported).
In https://gist.github.com/rehanift/edf6beff3a149ffe4e0a#file-success-js I am creating 2500 filters using the string pattern "a[0-2499]". I send 2500 messages (one to each filter again) and all 2500 messages are received.
I reproduced this with node v0.10.28, zmq 2.7.0, and zeromq 3.2.4.
After some basic searching I haven't been able to find any information about the restrictions on length or content of zeromq subscription filters. Is it possible that this is being introduced in the zmq package?
After a little more testing it seems like the issue is with the uniqueness of the filters rather than the length of the filters.
I updated the success test case to use 2500 subscription filters which each have a minimum length of 26 characters and it still passes. The failure test case has 2500 subscription filters which have a total length of 8 random characters.
My next step is to try and reproduce with another ZeroMQ language binding
Hi @rehanift , one year later but I've just had this problem yesterday using zmq4 package for Golang. I don't really understand why, but it zmq simply gets stuck when tries to send the message > 1000.
As a simple solution, when I hit that threshold ( numberOfSentMessages % 1000 == 0
) I close, open and connect to a brand new socket. Even though in our production environment 1000 messages seems waaaay too much, we had identified that problem when doing stress tests. Might as well be safe than sorry and let it process the whole damn thing without limit :)
Given our environment, having too many messages all of a sudden might be another completely different problem than just silently not sending/breaking/getting stuck with a strange 1000-messages-limit.
Are you seeing this same exact issue, where you have many subscribers each with unique subscription filters? Or are you seeing a variant of it?
Matheus Felipe writes:
Hi @rehanift , one year later but I've just had this problem yesterday using zmq4 package for Golang. I don't really understand why, but it zmq simply gets stuck when tries to send the message > 1000.
As a simple solution, when I hit that threshold (
numberOfSentMessages % 1000 == 0
) I close, open and connect to a brand new socket. Even though in our production environment 1000 messages seems waaaay too much, we had identified that problem when doing stress tests. Might as well be safe than sorry and let it process the whole damn thing without limit :)Given our environment, having too many messages all of a sudden might be another completely different problem than just silently not sending/breaking/getting stuck with a strange 1000-messages-limit.
Reply to this email directly or view it on GitHub: https://github.com/JustinTulloss/zeromq.node/issues/196#issuecomment-70705314
-Rehan
If I understood correctly, I am seeing the exact same. In fact, I found a better (and correct) solution, that might be applicable to your case.
I had to set the ZMQ_SNDHWM
and ZMQ_RCVHWM
variables, which by default is 1000! (see details here: http://api.zeromq.org/4-0:zmq-setsockopt#toc3).
I increased this value to 5000, but I still kept that numberOfSentMessages % 5000 == 0
(%5000 this time) just to guarantee edge cases, which are quite unlikely to happen here, by the way.
Let me know if setting those 2 variables above helps you as well. I've been in contact with queues for a few years and I understand them and their use quite well, but I never had the chance to actually manage/develop them, so many things like those crazy variables are new to me.
Cheers!