jeromq icon indicating copy to clipboard operation
jeromq copied to clipboard

Assertion error/ possibly race condition in Pub/Sub

Open elgrocho opened this issue 9 years ago • 6 comments

Hi,

I am using jeromq version 0.3.5 and run my tests with Java8 (1.8.0_65) on Windows 7.

When stress testing my app, I noticed that I get assertion error on regular basis.

java.lang.AssertionError
    at zmq.Signaler.recv(Signaler.java:173)
    at zmq.Mailbox.recv(Mailbox.java:101)
    at zmq.SocketBase.processCommands(SocketBase.java:864)
    at zmq.SocketBase.send(SocketBase.java:627)
    at org.zeromq.ZMQ$Socket.send(ZMQ.java:1302)
    at org.zeromq.ZMQ$Socket.send(ZMQ.java:1291)

I tried to narrow it down and managed to get a code to reproduce it (see below).

    ZMQ.Context context = ZMQ.context(1);
    String address = "tcp://localhost:30000";
    byte[] msg = "abc".getBytes();

    //run publisher
    Thread pubThread = new Thread(() -> {
        ZMQ.Socket publisher = context.socket(ZMQ.PUB);
        publisher.bind(address);
        while(!Thread.currentThread().isInterrupted()) {
            publisher.send(msg);
        }
    });
    pubThread.setUncaughtExceptionHandler((t, e) -> e.printStackTrace());
    pubThread.start();

    //run subscriber
    Thread subThread = new Thread(() -> {
        ZMQ.Socket subscriber = context.socket(ZMQ.SUB);
        subscriber.connect(address);
        subscriber.subscribe("".getBytes());
        while(!Thread.currentThread().isInterrupted()) {
            subscriber.recv();
        }
    });
    subThread.setUncaughtExceptionHandler((t, e) -> e.printStackTrace());
    subThread.start();

    //let it run for a while
    Thread.sleep(5000);
  1. I create a PUB socket which is spinning on send().
  2. I create SUB and let it spin on recv().

In this scenario it fails straight away. When I introduce a bit of delay on either send or receive (by addding Thread.sleep(1)), it works fine.

EDIT: code formatting - @daveyarwood

elgrocho avatar Feb 08 '16 11:02 elgrocho

Still happening on jeromq 0.3.6

ylexus avatar Jun 02 '17 10:06 ylexus

Confirm same problem.

W1zzard avatar Sep 18 '17 10:09 W1zzard

A commit fixed this particular issue on Windows, it should be available from 0.4.0

fredoboulo avatar Oct 02 '17 07:10 fredoboulo

@elgrocho @ylexus @W1zzard Are you still experiencing this issue on the latest version of JeroMQ?

daveyarwood avatar Oct 05 '17 03:10 daveyarwood

The latest 0.4.4 and 0.4.3 are still having this issue in Win10. Need to add sleep 1 second after binding of PUB side.

JingchengLao avatar Feb 27 '18 03:02 JingchengLao

Action needed: translate the repro code from the original post into a test case, determine the cause and fix it.

daveyarwood avatar Oct 28 '18 01:10 daveyarwood