jeromq icon indicating copy to clipboard operation
jeromq copied to clipboard

java.lang.NullPointerException at zmq.pipe.YQueue.pop(YQueue.java:147)

Open chinababu opened this issue 7 years ago • 11 comments

Why do we get this error? No clue of any application thread. This stack trace is produced continuously. I am finding it difficult to trace back to application code from where this is thrown.

java.lang.NullPointerException at zmq.pipe.YQueue.pop(YQueue.java:147) at zmq.pipe.YPipe.read(YPipe.java:144) at zmq.pipe.Pipe.read(Pipe.java:214) at zmq.io.SessionBase.pullMsg(SessionBase.java:126) at zmq.io.StreamEngine$PullAndEncode.nextMsg(StreamEngine.java:117) at zmq.io.StreamEngine.outEvent(StreamEngine.java:555) at zmq.io.StreamEngine.restartOutput(StreamEngine.java:623) at zmq.io.SessionBase.readActivated(SessionBase.java:256) at zmq.pipe.Pipe.processActivateRead(Pipe.java:312) at zmq.ZObject.processCommand(ZObject.java:51) at zmq.Command.process(Command.java:75) at zmq.io.IOThread.inEvent(IOThread.java:80) at zmq.poll.Poller.run(Poller.java:268) at java.lang.Thread.run(Thread.java:745) java.lang.NullPointerException at zmq.pipe.YQueue.frontPos(YQueue.java:59) at zmq.pipe.YPipe.checkRead(YPipe.java:102) at zmq.pipe.YPipe.read(YPipe.java:137) at zmq.pipe.Pipe.read(Pipe.java:214) at zmq.io.SessionBase.pullMsg(SessionBase.java:126) at zmq.io.StreamEngine$PullAndEncode.nextMsg(StreamEngine.java:117) at zmq.io.StreamEngine.outEvent(StreamEngine.java:555) at zmq.io.IOObject.outEvent(IOObject.java:92) at zmq.poll.Poller.run(Poller.java:265) at java.lang.Thread.run(Thread.java:745)

Thanks, Chinna

chinababu avatar Feb 02 '18 18:02 chinababu

at zmq.poll.Poller.run(Poller.java:265)

I'm guessing this is the thing your application is calling that results in the exception.

daveyarwood avatar Feb 02 '18 18:02 daveyarwood

The stack traces indicate it is happening in I/O thread.

Would you provide more context about when it is happening (what are the sockets used by your app, the frequency of occurrence of the error, the OS, ...)? Even possibly a test case showing the problem?

fredoboulo avatar Feb 03 '18 20:02 fredoboulo

@fredoboulo Sorry for delay. This issue is not occurring consistently, it hit me back again now. We have 4 sockets, 2 of type DEALER and 2 of type PUB. and 3 subscribers for one of the publisher. We close and recreate all of them in certain situation (some times in few seconds, some times in a minute or above). Since the error is from a thread internal to JeroMQ and there is no link from application thread, I am not sure what kind of details need to be provided. Please let me know any other specific details are needed. Any help is appreciated. Second NPE from above exceptions is printed infinitely.

OS: 3.10.0-514.36.5.el7.x86_64 GNU/Linux JeroMQ 0.4.2 openjdk version "1.8.0_102" OpenJDK Runtime Environment (build 1.8.0_102-b14) OpenJDK 64-Bit Server VM (build 25.102-b14, mixed mode) Occurences: once in 3 consecutive executions Sockets: 2 DEALER and 2 PUB with 3 subscribers for one of them.

thanks, Chinna

chinababu avatar Mar 12 '18 14:03 chinababu

Thread Dump is here. All the lines that have application classes referred are removed. Though I do know much details, is this related JDK epoll bug?

2018-03-12 14:17:30 Full thread dump OpenJDK 64-Bit Server VM (25.102-b14 mixed mode):

"Thread-15" #30 daemon prio=5 os_prio=0 tid=0x00007fcdd82b9000 nid=0x58 in Object.wait() [0x00007fce336f5000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000005cbc0a728> (a java.lang.Object) ///////// truncated - lines from application code

"Thread-13" #26 daemon prio=5 os_prio=0 tid=0x00007fcdd82b7000 nid=0x57 runnable [0x00007fce337f6000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbc10208> (a sun.nio.ch.Util$3) - locked <0x00000005cbc10218> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbc101c0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.ZMQ.poll(ZMQ.java:704) at org.zeromq.ZPoller.poll(ZPoller.java:501) at org.zeromq.ZPoller.poll(ZPoller.java:481) at org.zeromq.ZPoller.poll(ZPoller.java:461) ///////// truncated - lines from application code at java.lang.Thread.run(Thread.java:745)

"Thread-14" #29 daemon prio=5 os_prio=0 tid=0x00007fcdd8283800 nid=0x56 runnable [0x00007fce338f7000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbc0e898> (a sun.nio.ch.Util$3) - locked <0x00000005cbc0e8a8> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbc0e7b0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.ZMQ.poll(ZMQ.java:704) at org.zeromq.ZMQ$Poller.poll(ZMQ.java:2939) ///////// truncated - lines from application code at java.lang.Thread.run(Thread.java:745)

"iothread-2" #28 daemon prio=5 os_prio=0 tid=0x00007fcdd824f800 nid=0x55 runnable [0x00007fce339f8000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbc0a988> (a sun.nio.ch.Util$3) - locked <0x00000005cbc0a998> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbc0a940> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.poll.Poller.run(Poller.java:229) at java.lang.Thread.run(Thread.java:745)

"reaper-1" #27 daemon prio=5 os_prio=0 tid=0x00007fcdd821c000 nid=0x54 runnable [0x00007fce33af9000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbc104f8> (a sun.nio.ch.Util$3) - locked <0x00000005cbc10508> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbc104b0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.poll.Poller.run(Poller.java:229) at java.lang.Thread.run(Thread.java:745)

"Thread-11" #22 daemon prio=5 os_prio=0 tid=0x00007fcdd81b6000 nid=0x53 runnable [0x00007fce33bfa000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbc11390> (a sun.nio.ch.Util$3) - locked <0x00000005cbc113a0> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbc11348> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.ZMQ.poll(ZMQ.java:704) at org.zeromq.ZPoller.poll(ZPoller.java:501) at org.zeromq.ZPoller.poll(ZPoller.java:481) at org.zeromq.ZPoller.poll(ZPoller.java:461) ///////// truncated - lines from application code at java.lang.Thread.run(Thread.java:745)

"Thread-12" #25 daemon prio=5 os_prio=0 tid=0x00007fcdd80d0800 nid=0x52 runnable [0x00007fce33cfb000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbd03668> (a sun.nio.ch.Util$3) - locked <0x00000005cbd03678> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbd03620> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.ZMQ.poll(ZMQ.java:704) at org.zeromq.ZMQ$Poller.poll(ZMQ.java:2939) ///////// truncated - lines from application code at java.lang.Thread.run(Thread.java:745)

"iothread-2" #24 daemon prio=5 os_prio=0 tid=0x00007fcdd814f800 nid=0x51 runnable [0x00007fce388da000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbd05f98> (a sun.nio.ch.Util$3) - locked <0x00000005cbd05fa8> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbd05f50> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.poll.Poller.run(Poller.java:229) at java.lang.Thread.run(Thread.java:745)

"reaper-1" #23 daemon prio=5 os_prio=0 tid=0x00007fcdd80c8800 nid=0x50 runnable [0x00007fce387d9000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbd8c388> (a sun.nio.ch.Util$3) - locked <0x00000005cbd8c398> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbd8c340> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.poll.Poller.run(Poller.java:229) at java.lang.Thread.run(Thread.java:745)

"Thread-10" #21 daemon prio=5 os_prio=0 tid=0x00007fcdd80ca800 nid=0x4f waiting on condition [0x00007fce33dfc000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) ///////// truncated - lines from application code at java.lang.Thread.run(Thread.java:745)

"applnclass@6991761aQoS" #20 daemon prio=5 os_prio=0 tid=0x00007fcdd80c7000 nid=0x4e in Object.wait() [0x00007fce33efd000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) ///////// truncated - lines from application code at java.lang.Thread.run(Thread.java:745)

"Thread-8" #19 daemon prio=5 os_prio=0 tid=0x00007fcdd809b800 nid=0x4d runnable [0x00007fce33ffe000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbda8500> (a sun.nio.ch.Util$3) - locked <0x00000005cbda84f0> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbda82f8> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.ZMQ.poll(ZMQ.java:704) at org.zeromq.ZPoller.poll(ZPoller.java:501) at org.zeromq.ZPoller.poll(ZPoller.java:481) at org.zeromq.ZPoller.poll(ZPoller.java:461) ///////// truncated - lines from application code at java.lang.Thread.run(Thread.java:745)

"Thread-7" #18 daemon prio=5 os_prio=0 tid=0x00007fcdd8068000 nid=0x4c runnable [0x00007fce381d5000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbdd0178> (a sun.nio.ch.Util$3) - locked <0x00000005cbdd0188> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbe0a298> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.ZMQ.poll(ZMQ.java:704) at org.zeromq.ZPoller.poll(ZPoller.java:501) at org.zeromq.ZPoller.poll(ZPoller.java:481) at org.zeromq.ZPoller.poll(ZPoller.java:461) ///////// truncated - lines from application code at java.lang.Thread.run(Thread.java:745)

"Thread-6" #17 daemon prio=5 os_prio=0 tid=0x00007fcdd8034000 nid=0x4b runnable [0x00007fce382d6000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbe0d9e0> (a sun.nio.ch.Util$3) - locked <0x00000005cbe0d9d0> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbe0d7d8> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.ZMQ.poll(ZMQ.java:704) at org.zeromq.ZPoller.poll(ZPoller.java:501) at org.zeromq.ZPoller.poll(ZPoller.java:481) at org.zeromq.ZPoller.poll(ZPoller.java:461) ///////// truncated - lines from application code at java.lang.Thread.run(Thread.java:745)

"applnclass@559b3529QoS" #16 daemon prio=5 os_prio=0 tid=0x00007fcddc033800 nid=0x1c in Object.wait() [0x00007fce383d7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) ///////// truncated - lines from application code at java.lang.Thread.run(Thread.java:745)

"Thread-1" #10 daemon prio=5 os_prio=0 tid=0x00007fce54267000 nid=0x18 runnable [0x00007fce389db000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbe7ca38> (a sun.nio.ch.Util$3) - locked <0x00000005cbe7ca48> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbe7c9f0> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.ZMQ.poll(ZMQ.java:704) at org.zeromq.ZPoller.poll(ZPoller.java:501) at org.zeromq.ZPoller.poll(ZPoller.java:481) at org.zeromq.ZPoller.poll(ZPoller.java:461) ///////// truncated - lines from application code at java.lang.Thread.run(Thread.java:745)

"iothread-2" #12 daemon prio=5 os_prio=0 tid=0x00007fce5421a000 nid=0x17 runnable [0x00007fce38efc000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) - locked <0x00000005cbc13528> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:482) - locked <0x00000005cbc13508> (a java.io.PrintStream) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) - locked <0x00000005cbc135f8> (a java.io.OutputStreamWriter) at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) at java.io.PrintStream.write(PrintStream.java:527) - eliminated <0x00000005cbc13508> (a java.io.PrintStream) at java.io.PrintStream.print(PrintStream.java:669) at java.io.PrintStream.println(PrintStream.java:823) - locked <0x00000005cbc13508> (a java.io.PrintStream) at java.lang.Throwable$WrappedPrintStream.println(Throwable.java:748) at java.lang.Throwable.printStackTrace(Throwable.java:655) - locked <0x00000005cbc13508> (a java.io.PrintStream) at java.lang.Throwable.printStackTrace(Throwable.java:643) at java.lang.Throwable.printStackTrace(Throwable.java:634) at zmq.poll.Poller.run(Poller.java:277) at java.lang.Thread.run(Thread.java:745)

"reaper-1" #11 daemon prio=5 os_prio=0 tid=0x00007fce541af000 nid=0x16 runnable [0x00007fce38ffd000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <0x00000005cbe988e0> (a sun.nio.ch.Util$3) - locked <0x00000005cbe988f0> (a java.util.Collections$UnmodifiableSet) - locked <0x00000005cbe98898> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at zmq.poll.Poller.run(Poller.java:229) at java.lang.Thread.run(Thread.java:745)

"Service Thread" #8 daemon prio=9 os_prio=0 tid=0x00007fce540d3800 nid=0x14 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE

"C1 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007fce540ce800 nid=0x13 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007fce540cc800 nid=0x12 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007fce540bf000 nid=0x11 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007fce540bd000 nid=0x10 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007fce54093000 nid=0xf in Object.wait() [0x00007fce40beb000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000005cbc0c400> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) - locked <0x00000005cbc0c400> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fce5408e800 nid=0xe in Object.wait() [0x00007fce40cec000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000005cbc0c3a8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference.tryHandlePending(Reference.java:191) - locked <0x00000005cbc0c3a8> (a java.lang.ref.Reference$Lock) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

"main" #1 prio=5 os_prio=0 tid=0x00007fce54009000 nid=0x8 waiting on condition [0x00007fce5b98c000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) ///////// truncated - lines from application code

"VM Thread" os_prio=0 tid=0x00007fce54084800 nid=0xd runnable

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fce5401e000 nid=0x9 runnable

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fce5401f800 nid=0xa runnable

"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007fce54021800 nid=0xb runnable

"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fce54023800 nid=0xc runnable

"VM Periodic Task Thread" os_prio=0 tid=0x00007fce540d6000 nid=0x15 waiting on condition

JNI global references: 271

Heap PSYoungGen total 115712K, used 2212K [0x0000000719400000, 0x0000000721080000, 0x00000007c0000000) eden space 108032K, 1% used [0x0000000719400000,0x00000007196190c8,0x000000071fd80000) from space 7680K, 0% used [0x0000000720900000,0x0000000720910000,0x0000000721080000) to space 9728K, 0% used [0x000000071fd80000,0x000000071fd80000,0x0000000720700000) ParOldGen total 343040K, used 6093K [0x00000005cbc00000, 0x00000005e0b00000, 0x0000000719400000) object space 343040K, 1% used [0x00000005cbc00000,0x00000005cc1f34e8,0x00000005e0b00000) Metaspace used 9641K, capacity 9812K, committed 9984K, reserved 1058816K class space used 932K, capacity 998K, committed 1024K, reserved 1048576K

chinababu avatar Mar 12 '18 22:03 chinababu

Out of curiosity, what is the pattern of your closing/recreate operations?

YQueue is mainly untouched since... very long time (present in 3.3 at least, minus formatting), I expect something like that exception would have come long before. So I would rather investigate around first: do you perform any operation after closing a socket, do you share sockets between threads, as a starter.

From the thread dump I can just read 3 contexts used in your app, with several ZPoller used as well, this does not give enough info regarding your usage of the library.

fredoboulo avatar Mar 20 '18 23:03 fredoboulo

we have "zctx1" of type ZContext shared for entire application.

Creating: ZMQ.Socket zsocket = zctx1.createSocket(DEALER|PUB|SUB) - shared by couple threads. iothread created by this call with SUB type is throwing error. zsocket.connect(url) - tcp selector = zctx1.createSelector() poller = new ZPoller(selector) poller.register(zsocket)

Polling: poller.poll(timeout) poller.isReadable(zsocket)

Closing: close threads that share this socket. zctx1.destroySocket(zsocket) close selector close poller

Our application code does not make a call to zmq.poll.Poller.run(Poller.java:265), this call happened by iothread. We have two of these identical processes with a haproxy.

chinababu avatar Mar 23 '18 20:03 chinababu

If you are sharing zsocket with a couple of user-threads, you should change your code, ZMQ sockets are not thread-safe.

fredoboulo avatar Mar 24 '18 10:03 fredoboulo

Though I dont have much understanding of internals, I think I have made some progress on this and proved that the issue was with haproxy. As we are using docker containers for all processes, haproxy is able to establish a tcp connection though the internal ports of container are NOT bound for load balancing. So, JeroMQ has some internal issues with establishing connection rightly. After that temporary fix (fixed by using static ip of containers and ports in haproxy), I am having different following error: java.lang.NullPointerException at zmq.pipe.Pipe.isDelimiter(Pipe.java:504) at zmq.pipe.Pipe.checkRead(Pipe.java:193) at zmq.io.SessionBase.readActivated(SessionBase.java:252) at zmq.pipe.Pipe.processActivateRead(Pipe.java:312) at zmq.ZObject.processCommand(ZObject.java:51) at zmq.Command.process(Command.java:75) at zmq.io.IOThread.inEvent(IOThread.java:80) at zmq.poll.Poller.run(Poller.java:268) at java.lang.Thread.run(Thread.java:745)

I made change to Pipe.isDelimiter() to return return (msg == null ? false : msg.isDelimiter()); and everything started working fine. Do you think it's a bug or I have a situation that I am not supposed have?

Thanks, Chinna

chinababu avatar Mar 28 '18 20:03 chinababu

I fail to follow the causal relationship, but feel free to propose a PR for your fix, it may be useful for others.

fredoboulo avatar Apr 06 '18 21:04 fredoboulo

@chinababu There could easily be bugs with JeroMQ. A PR with your changes would be greatly appreciated.

daveyarwood avatar Oct 28 '18 02:10 daveyarwood

I think I fixed that issue with the following change:

https://github.com/zeromq/jeromq/commit/07dcebd13c3d3306ae3326f977e4da11359076d9#diff-da2d6c91e26a787671da920e8bf2d452R103

It was actually happening on NetMQ (which was a port of jeromq) and we fixed it a few years back.

somdoron avatar May 09 '20 06:05 somdoron