jeromq
jeromq copied to clipboard
isDelimiter NPE
Keep getting crashes in Pipe.java. Seems that inpipe.checkRead() returns true, but afterwards inpipe.probe() returns null.
Fatal Exception: java.lang.NullPointerException
at zmq.Pipe.isDelimiter(Pipe.java:472)
at zmq.Pipe.checkRead(Pipe.java:183)
at zmq.SessionBase.readActivated(SessionBase.java:298)
at zmq.XSub$XSubSession.readActivated(XSub.java:24)
at zmq.Pipe.processActivateRead(Pipe.java:290)
at zmq.ZObject.processCommand(ZObject.java:57)
at zmq.IOThread.inEvent(IOThread.java:93)
at zmq.Poller.run(Poller.java:247)
at java.lang.Thread.run(Thread.java:838)
Code that triggers the crash:
https://gist.github.com/graphiclife/27c4c5fa68d8c3fe2ff7
I was not able to reproduce the NPE with your example code. Did you use the latest commit on master? Does the NPE always happen or on an irregular pattern?
I don't know much about the codebase yet, but it looks like the cause might be inpipe.probe()
returning null
on this line.
We're not getting an assertion error in the probe
function, which means we must be returning queue.front()
and getting null
there.
Looking at queue.front()
, it's just getting a value from an array by index. And we're not getting an OutOfBoundsException, which leads me to believe there must have been a null Msg in the queue.
I have no idea why that would be the case, but thought I'd leave my notes here in case it helps someone who may know more.
I took a brief look at the code again and it appears to still be in about the same state. My assessment now is the same it was a year ago.
Information needed:
- An explanation by someone who understands the code.
- What does it mean for a null Msg to be in the queue?
Is there a known way to trigger this? I saw this exact NPE on a running server my company has, and cannot figure out how we got there. I have been trying to put garbage onto our pipe but cannot manually induce it. All the referenced code from the original investigation 404s on me.
If someone could explain how you get a null message into the queue I could at least try to see if some of the new messaging code is somehow triggering that, since I can't directly fix an NPE in jeromq. Thanks!
It might be fixed with the following change:
07dcebd#diff-da2d6c91e26a787671da920e8bf2d452R103
It was actually happening on NetMQ (which was a port of jeromq) and we fixed it a few years back.