jeromq
jeromq copied to clipboard
Assertion error/ possibly race condition in Pub/Sub
Hi,
I am using jeromq version 0.3.5 and run my tests with Java8 (1.8.0_65) on Windows 7.
When stress testing my app, I noticed that I get assertion error on regular basis.
java.lang.AssertionError
at zmq.Signaler.recv(Signaler.java:173)
at zmq.Mailbox.recv(Mailbox.java:101)
at zmq.SocketBase.processCommands(SocketBase.java:864)
at zmq.SocketBase.send(SocketBase.java:627)
at org.zeromq.ZMQ$Socket.send(ZMQ.java:1302)
at org.zeromq.ZMQ$Socket.send(ZMQ.java:1291)
I tried to narrow it down and managed to get a code to reproduce it (see below).
ZMQ.Context context = ZMQ.context(1);
String address = "tcp://localhost:30000";
byte[] msg = "abc".getBytes();
//run publisher
Thread pubThread = new Thread(() -> {
ZMQ.Socket publisher = context.socket(ZMQ.PUB);
publisher.bind(address);
while(!Thread.currentThread().isInterrupted()) {
publisher.send(msg);
}
});
pubThread.setUncaughtExceptionHandler((t, e) -> e.printStackTrace());
pubThread.start();
//run subscriber
Thread subThread = new Thread(() -> {
ZMQ.Socket subscriber = context.socket(ZMQ.SUB);
subscriber.connect(address);
subscriber.subscribe("".getBytes());
while(!Thread.currentThread().isInterrupted()) {
subscriber.recv();
}
});
subThread.setUncaughtExceptionHandler((t, e) -> e.printStackTrace());
subThread.start();
//let it run for a while
Thread.sleep(5000);
- I create a PUB socket which is spinning on
send()
. - I create SUB and let it spin on
recv()
.
In this scenario it fails straight away. When I introduce a bit of delay on either send or receive (by addding Thread.sleep(1)
), it works fine.
EDIT: code formatting - @daveyarwood
Still happening on jeromq 0.3.6
Confirm same problem.
A commit fixed this particular issue on Windows, it should be available from 0.4.0
@elgrocho @ylexus @W1zzard Are you still experiencing this issue on the latest version of JeroMQ?
The latest 0.4.4 and 0.4.3 are still having this issue in Win10. Need to add sleep 1 second after binding of PUB side.
Action needed: translate the repro code from the original post into a test case, determine the cause and fix it.