jeromq Errors when rapidly opening/closing sockets and contexts

I'm attempting to create an instance of my ZMQ stack for each unit test in my test suite. But I get problems when I rapidly open/close sockets and contexts.

Some times I get this:

Exception in thread "reaper-1" java.lang.AssertionError
    at zmq.Mailbox.recv(Mailbox.java:114)
    at zmq.SocketBase.process_commands(SocketBase.java:830)
    at zmq.SocketBase.in_event(SocketBase.java:927)
    at zmq.Poller.run(Poller.java:237)
    at java.lang.Thread.run(Thread.java:745)

Other times it hangs silently at ctx.term();

I've created a maven project with a single junit test that demonstrates the issue: https://github.com/augustl/jeromq-issue

For the record, the test case is:

public class ExampleText {
    @Test
    public void demonstrateIssue() {
        for (int i = 0; i < 50; i++) {
            performTest();
        }
    }

    private void performTest() {
        Context ctx = ZMQ.context(1);
        Socket recvMsgSock = ctx.socket(ZMQ.PULL);
        recvMsgSock.bind("tcp://*:5115");
        Socket processMsgSock = ctx.socket(ZMQ.PUSH);
        processMsgSock.bind("inproc://process-msg");

        List<Socket> workerSocks = new ArrayList<Socket>();
        for (int i = 0; i < 5; i++) {
            Socket workerSock = ctx.socket(ZMQ.PULL);
            workerSock.connect("inproc://process-msg");
            workerSocks.add(workerSock);
        }

        Thread proxyThr = new Thread(new ZMQQueue(ctx, recvMsgSock, processMsgSock));
        proxyThr.setName("Proxy thr");
        proxyThr.start();

        for (final Socket workerSock : workerSocks) {
            Thread workerThr = new Thread(new Runnable() {
                @Override
                public void run() {
                    try {
                        while (true) {
                            byte[] msg = workerSock.recv();
                            // Process the msg!
                        }
                    } catch (Exception e) {

                    }
                }
            });
            workerThr.setName("A worker thread");
            workerThr.start();
        }

        try {
            Thread.sleep(100);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }

        System.out.println("Closing now");


        recvMsgSock.close();
        processMsgSock.close();

        for (Socket workerSock : workerSocks) {
            workerSock.close();
        }

        ctx.term();
        System.out.println("Successfully closed");
    }
}

Oct 19 '14 19:10 augustl

@augustl Thanks for the test case, I was able to recreate the issue. Looking into it now.

Oct 19 '14 20:10 trevorbernard

I've experienced this problem as well.

Mar 03 '15 19:03 courtarro

@trevorbernard any update on this issue?

Jul 02 '15 18:07 kevinconaway

Any updates on this issue ? I see this issue quite frequently on blocking recv() calls on Pull or Dealer sockets. When the router or push socket(server) is closed abruptly, the listener(client) jeromq thread dies with the assertion error.

Nov 09 '15 17:11 kdmarathe

I'm running into this (or at least similar) issue. My use case is similar to some extent: closing and recreating sockets with a high frequency as part of the integration testing.

Looking into the provided reproduction path I've noticed that the test case creates, connects and closes the 'workerSocks' in the main thread while the read operation happens in another thread. Might this explain the failure?

In contrast to this test case, my implementation does not share sockets between threads but still run into: 'Exception in thread "reaper-1" java.lang.AssertionError'. Although, I do not share the sockets between multiple threads I do share the 0mq context (so that multiple threads would create sockets from the same context instance). So, I was wondering if sharing the context might determine such behavior? Is the ZContext meant to be shareable/threadsafe?

Mar 23 '17 08:03 mariusspan

Action needed: Add a test case based on the original post in this issue. I took a quick look in our tests and I didn't see anything similar, with the possible exception of TooManyOpenFilesTester.

It's been a couple years since this issue was created; it would be interesting to see if the test case still fails.

Apr 27 '17 10:04 daveyarwood

Although, I do not share the sockets between multiple threads I do share the 0mq context (so that multiple threads would create sockets from the same context instance). So, I was wondering if sharing the context might determine such behavior? Is the ZContext meant to be shareable/threadsafe?

According to the ZGuide, contexts are threadsafe, but sockets are not threadsafe.

Apr 27 '17 10:04 daveyarwood

We can have a look (and a test), I'm curious about it (but not very optimistic).

TooManyOpenFilesTester was made to pinpoint the leak of selectors few years ago, it will not be fitted for that purpose. I can propose some of mine that I wrote long time ago, but if they fail should we block release?

Apr 28 '17 00:04 fredoboulo

I don't think we should necessarily block release -- we could always address the failing tests in the future.

Apr 28 '17 00:04 daveyarwood

Update: the test is still failing (if you remove the @Ignore annotation), with what's currently on master.

Oct 28 '18 00:10 daveyarwood

Any update on this ? It is really a problem for CI.

Feb 18 '19 19:02 JesusTheHun

I just tried un-ignoring the test case again and it's still hanging, so this is still an issue.

It would probably be worth examining what, exactly, we're doing in our test suite to get it to pass consistently! (I'm not trying to be snarky, just a thought that just occurred to me.)

Feb 01 '20 02:02 daveyarwood

jeromq jeromq copied to clipboard

Errors when rapidly opening/closing sockets and contexts

jeromq
jeromq copied to clipboard