jeromq
jeromq copied to clipboard
NullPointerException: Cannot invoke "zmq.IMailbox.send(zmq.Command)" because "this.slots[tid]" is null
Using version 0.5.4, after having interrupted a thread in which an open subscription was running, followed by calls to ZMonitor.close()
and ZContext.close()
, I got the following NPE (the first exception is to help understand the context):
Exception in thread "Thread-244" org.zeromq.ZMQException: Errno 4 : Interrupted function
at org.zeromq.ZMQ$Socket.mayRaise(ZMQ.java:3732)
at org.zeromq.ZMQ$Socket.recv(ZMQ.java:3530)
at org.zeromq.ZMQ$Socket.recv(ZMQ.java:3502)
...
at java.base/java.lang.Thread.run(Thread.java:840)
...
java.lang.NullPointerException: Cannot invoke "zmq.IMailbox.send(zmq.Command)" because "this.slots[tid]" is null
at zmq.Ctx.sendCommand(Ctx.java:615)
at zmq.ZObject.sendCommand(ZObject.java:410)
at zmq.ZObject.sendPipeTermAck(ZObject.java:260)
at zmq.pipe.Pipe.processPipeTermAck(Pipe.java:421)
at zmq.ZObject.processCommand(ZObject.java:91)
at zmq.Command.process(Command.java:79)
at zmq.SocketBase.processCommands(SocketBase.java:1198)
at zmq.SocketBase.inEvent(SocketBase.java:1365)
at zmq.poll.Poller.run(Poller.java:276)
at java.base/java.lang.Thread.run(Thread.java:840)
It is worth noting that this exception is a rare occurrence, having shown up only after many similar executions of the same code.
Did you try with release 0.6.0 ?
I confirm this exception can occur in 0.6.0 (this happens sometimes in a scenario like the one described in https://github.com/zeromq/jeromq/issues/984; both issues may be due to the same underlying problem):
Exception in thread "ZMonitor-Sub[56]" java.lang.NullPointerException: Cannot invoke "zmq.IMailbox.send(zmq.Command)" because "this.slots[tid]" is null
at zmq.Ctx.sendCommand(Ctx.java:662)
at zmq.ZObject.sendCommand(ZObject.java:410)
at zmq.ZObject.sendReapAck(ZObject.java:290)
at zmq.SocketBase.processCommands(SocketBase.java:1183)
at zmq.SocketBase.send(SocketBase.java:854)
at zmq.SocketBase.send(SocketBase.java:792)
at org.zeromq.ZMQ$Socket.send(ZMQ.java:3445)
at org.zeromq.ZMQ$Socket.send(ZMQ.java:3359)
at org.zeromq.ZStar$Plateau.run(ZStar.java:503)
at org.zeromq.ZThread$ShimThread.run(ZThread.java:57)
I had an NPE with version 0.5.4 at the same line as in the OP, but with a different stack trace. It turned out that due to a race condition I was calling CancellationToken::cancel
after the socket-owning thread had closed the socket, so the fault was in my code after all (OTOH, should cancel()
on a closed socket really throw a NPE?).
That said, I'm not sure if the cancel()
code really is correct. The real issue here is that access to slot[tid]
is not synchronized properly AFAICS. I guess that's the main reason why the documentation clearly says that a socket should only ever be used by the thread that created it, but the cancellation token deliberately breaks that thread boundary and therefore requires proper synchronization.
I had an NPE with version 0.5.4 at the same line as in the OP, but with a different stack trace. It turned out that due to a race condition I was calling
CancellationToken::cancel
after the socket-owning thread had closed the socket, so the fault was in my code after all (OTOH, shouldcancel()
on a closed socket really throw a NPE?).That said, I'm not sure if the
cancel()
code really is correct. The real issue here is that access toslot[tid]
is not synchronized properly AFAICS. I guess that's the main reason why the documentation clearly says that a socket should only ever be used by the thread that created it, but the cancellation token deliberately breaks that thread boundary and therefore requires proper synchronization.
If that's the case then cancel()
usage should be discouraged and deprecated for reasons you described. Better would be to use a pattern that's officially supported. E.g. Send a shutdown command to the socket/thread via the same socket or a different command channel.