zmq4
zmq4 copied to clipboard
Incorrect errno
The problem
I issue non-blocking read on DEALER socket connected to ROUTER socket.
data, err := client.RecvMessage(zmq.DONTWAIT)
ROUTER takes at least 1 second to complete the task (due to sleep()) and I do the read immediately.
I expected to get EAGAIN error, but instead I got err == nil and len(data) == 0 - proper empty read.
Situation
By debugging the library it seems to me, that this call starts the error (RecvBytes, zmq4.go:1077):
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))
Here, size == -1 but err == nil.
Therefore errget(err) with nil returns nil instead of true error.
Maybe errget should do something when it is call with nil argument?
I believe that the root cause of this particular problem is not using zmq_errno.
In the documentation of that function it is said, that it should be used to properly get errno, when for example in a situation, where the application links to different C runtime, than the libzmq.
This is probably my case, because this happens on Windows, I have libzmq.dll built with MSVC and then generated stub libzmq.a using gcc dlltools. So the setup is exotic (but hey, welcome to compiling C libs on Windows + Go + Cgo).
What's more, during C. calls in Go, it returns plain errno and it is essentialy wrong in this case.
When I tried e := C.zmq_errno() just after the failed read - I get the correct EAGAIN (11) error.
Solutions?
While I probably could check C.zmq_errno() after each call, but I am not sure if it is sufficient enough and will the errors be cleared after succesful calls?
EDIT:
No, the error is not cleared. And since the returned err is nil, there is no way to now that C.zmq_errno() result is valid in this situation (+ all the threading issues possible).
One solution may be to drop all err from _, err := C. ... and call C.zmq_errno() instead? But it will require changes in many places.
Maybe modifications to errget will be sufficient? For example if argument err is nil the check the C.zmq_errno() ?
Forget about the previous comment. It's all wrong. An interrupted signal call gives a EINTR, not a EAGAIN. I undid the changes.
So what is the problem exactly? Provide code that demonstrates.
@pebbe
I am not sure if it is easily reproducible. I believe that the main reason behind this is exactly what zmq_errno() is for. I found it through this SO
Look at the definition of this function:
int zmq_errno (void) { return errno; }
It returns just the errno - but from the context of the library itself.
In my case I have libzmq.dll built with MSVC, but I use gcc from MSYS2 for CGO. Therefore there may be the problem with proper propagation of the errno - situation described in here.
Your error handling relies on what C calling subsystem in GO gives you here:
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))
zmq_msg_recv only returns the size of the message, the err is given by golang:
Any C function (even void functions) may be called in a multiple assignment context to retrieve both the return value (if any) and the C errno variable as an error (use _ to skip the result value if the function returns void). For example:
by godoc
So this err is basically the same as just reading errno (which you in fact do in errget).
The problem is - errno in dll may be different errno in the app.
libzmq sets errno which only resides in dll, and in my app errno is always 0.
I understand that this problem is why zmq_errno came to be.
The problem itself
I run
msg, err := client.RecvMessage(zmq.DONTWAIT)
I am sure that there is no message in queue - I should receive msg = nil and err = EAGAIN.
This doesn't happen. I get msg = []byte("") (empty message) and err = nil.
By debugging your code I can see that:
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))
in this example returns size = -1 and err = nil.
Size = -1 clearly indicates that there IS and error, but Go gives you err = nil. In the next if you check the size to see if there is an error (and there is) and to get the actual error - you look into err. Which is nil.
So, size tells that there is an error, and err says there is none.
To me, the cause is in what I wrote in the begging - libzmq sets different errno, than CGO returns. You should probably check zmq_errno instead.
And look at this quote from zmq.h:
/* This function retrieves the errno as it is known to 0MQ library. The goal / / of this function is to make the code 100% portable, including where 0MQ / / compiled with certain CRT library (on Windows) is linked to an / / application that uses different CRT library. */ ZMQ_EXPORT int zmq_errno (void);
I think I may have a fix. Can you try the latest version, please?
The same situation happens when Binding to the same TCP port for the second time - it silently fails, but without an error. Therefore the process thinks that it can accept messages, while the underlying socket is dead.
This occurs in Bind (zmq4.go963):
i, err = C.zmq4_bind(soc.soc, s)
i = -1 but err = nil, so the same as previously.
I have downloaded your latest version and it seems fixed - this case (binding) now correctly returns error (though I am not sure why is it error 100 -> Cannot create another system semaphore, but this must be libzmq thing). I have not yet tested the EAGAIN case, but I believe it is the same as for Bind.