jeromq
jeromq copied to clipboard
Errors when rapidly opening/closing sockets and contexts
I'm attempting to create an instance of my ZMQ stack for each unit test in my test suite. But I get problems when I rapidly open/close sockets and contexts.
Some times I get this:
Exception in thread "reaper-1" java.lang.AssertionError
at zmq.Mailbox.recv(Mailbox.java:114)
at zmq.SocketBase.process_commands(SocketBase.java:830)
at zmq.SocketBase.in_event(SocketBase.java:927)
at zmq.Poller.run(Poller.java:237)
at java.lang.Thread.run(Thread.java:745)
Other times it hangs silently at ctx.term();
I've created a maven project with a single junit test that demonstrates the issue: https://github.com/augustl/jeromq-issue
For the record, the test case is:
public class ExampleText {
@Test
public void demonstrateIssue() {
for (int i = 0; i < 50; i++) {
performTest();
}
}
private void performTest() {
Context ctx = ZMQ.context(1);
Socket recvMsgSock = ctx.socket(ZMQ.PULL);
recvMsgSock.bind("tcp://*:5115");
Socket processMsgSock = ctx.socket(ZMQ.PUSH);
processMsgSock.bind("inproc://process-msg");
List<Socket> workerSocks = new ArrayList<Socket>();
for (int i = 0; i < 5; i++) {
Socket workerSock = ctx.socket(ZMQ.PULL);
workerSock.connect("inproc://process-msg");
workerSocks.add(workerSock);
}
Thread proxyThr = new Thread(new ZMQQueue(ctx, recvMsgSock, processMsgSock));
proxyThr.setName("Proxy thr");
proxyThr.start();
for (final Socket workerSock : workerSocks) {
Thread workerThr = new Thread(new Runnable() {
@Override
public void run() {
try {
while (true) {
byte[] msg = workerSock.recv();
// Process the msg!
}
} catch (Exception e) {
}
}
});
workerThr.setName("A worker thread");
workerThr.start();
}
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
System.out.println("Closing now");
recvMsgSock.close();
processMsgSock.close();
for (Socket workerSock : workerSocks) {
workerSock.close();
}
ctx.term();
System.out.println("Successfully closed");
}
}
@augustl Thanks for the test case, I was able to recreate the issue. Looking into it now.
I've experienced this problem as well.
@trevorbernard any update on this issue?
Any updates on this issue ? I see this issue quite frequently on blocking recv() calls on Pull or Dealer sockets. When the router or push socket(server) is closed abruptly, the listener(client) jeromq thread dies with the assertion error.
I'm running into this (or at least similar) issue. My use case is similar to some extent: closing and recreating sockets with a high frequency as part of the integration testing.
Looking into the provided reproduction path I've noticed that the test case creates, connects and closes the 'workerSocks' in the main thread while the read operation happens in another thread. Might this explain the failure?
In contrast to this test case, my implementation does not share sockets between threads but still run into: 'Exception in thread "reaper-1" java.lang.AssertionError'. Although, I do not share the sockets between multiple threads I do share the 0mq context (so that multiple threads would create sockets from the same context instance). So, I was wondering if sharing the context might determine such behavior? Is the ZContext meant to be shareable/threadsafe?
Action needed: Add a test case based on the original post in this issue. I took a quick look in our tests and I didn't see anything similar, with the possible exception of TooManyOpenFilesTester.
It's been a couple years since this issue was created; it would be interesting to see if the test case still fails.
Although, I do not share the sockets between multiple threads I do share the 0mq context (so that multiple threads would create sockets from the same context instance). So, I was wondering if sharing the context might determine such behavior? Is the ZContext meant to be shareable/threadsafe?
According to the ZGuide, contexts are threadsafe, but sockets are not threadsafe.
We can have a look (and a test), I'm curious about it (but not very optimistic).
TooManyOpenFilesTester was made to pinpoint the leak of selectors few years ago, it will not be fitted for that purpose. I can propose some of mine that I wrote long time ago, but if they fail should we block release?
I don't think we should necessarily block release -- we could always address the failing tests in the future.
Update: the test is still failing (if you remove the @Ignore
annotation), with what's currently on master.
Any update on this ? It is really a problem for CI.
I just tried un-ignoring the test case again and it's still hanging, so this is still an issue.
It would probably be worth examining what, exactly, we're doing in our test suite to get it to pass consistently! (I'm not trying to be snarky, just a thought that just occurred to me.)