libzmq icon indicating copy to clipboard operation
libzmq copied to clipboard

Assertion failure in epoll.cpp due to failing AF_UNIX bind() on Windows 10 1803

Open barometz opened this issue 5 years ago • 4 comments

Issue description

When I run the libzmq test suite, many tests (such as test_reqrep_tcp.exe) abort with an assertion in epoll.cpp. I eventually tracked this down to a bind() call in ip.cpp failing during context creation, with the Winsock error set to WSAEINVAL.

The failing bind call originates at https://github.com/zeromq/libzmq/blob/1d2af8d38842427feba909b2f47275120d104ec8/src/ip.cpp#L564

This error exclusively occurs on Windows 10 1803, which is EOL for consumers and rapidly approaching EOL for businesses, so I don't expect this to be fixed - but if someone searches for it, at least they'll have something. I suspect something's wrong with Windows' (then newly added) AF_UNIX support in 1803, because I couldn't get their own example from https://devblogs.microsoft.com/commandline/windowswsl-interop-with-af_unix/ to work either.

Environment

  • libzmq version (commit hash if unreleased): 4.3.3, also seen in 1d6b2329
  • OS: Windows 10 1803 (clean VM, newly set up for this)

Minimal test code / Steps to reproduce the issue

Either:

  1. Run test_reqrep_tcp.exe or many other tests

Or:

  1. Create a socket.
#include <zmq.hpp> // not pure libzmq, I know, but I'm already putting more time into this than makes any sense

int main(int, char**)
{
  zmq::context_t ctx;
  zmq::socket_t sock(ctx, zmq::socket_type::server);
  return 0;
}

What's the actual result? (include assertion message & call stack if applicable)

Assertion message from 1:

Z:\Debug>test_reqrep_tcp.exe
Bad file descriptor (C:\Code\libzmq\src\epoll.cpp:100)

Z:\Debug>echo %errorlevel%
1073741845

VS call stack from 2:

>	SystemHealth.Qualification.exe!zmq::zmq_abort(const char * errmsg_) Line 84	C++
 	SystemHealth.Qualification.exe!zmq::epoll_t::add_fd(unsigned int fd_, zmq::i_poll_events * events_) Line 100	C++
 	SystemHealth.Qualification.exe!zmq::reaper_t::reaper_t(zmq::ctx_t * ctx_, unsigned int tid_) Line 50	C++
 	SystemHealth.Qualification.exe!zmq::ctx_t::start() Line 430	C++
 	SystemHealth.Qualification.exe!zmq::ctx_t::create_socket(int type_) Line 490	C++
 	SystemHealth.Qualification.exe!zmq_socket(void * ctx_, int type_) Line 262	C++
 	SystemHealth.Qualification.exe!zmq::socket_t::socket_t(zmq::context_t & context_, int type_) Line 1564	C++
 	SystemHealth.Qualification.exe!zmq::socket_t::socket_t(zmq::context_t & context_, zmq::socket_type type_) Line 1575	C++

What's the expected result?

Passing tests, or a socket gets created and doesn't fail an assertion.

barometz avatar Nov 12 '20 14:11 barometz

I have seen this sort of stack trace, although google didn't send me here before I debugged it a lot. In my case, appears that the bind(AF_UNIX ...) calls in the signaler_t constructor of reaper_t/mailbox were failing. This snowballs down to the reaper_t constructor which gets a bad file handle from mailbox that it passes on and then finally gets handled by zmq_abort.

I suspect an old OS as well, I am also on 1803.

remoteBranch avatar May 29 '21 08:05 remoteBranch

Also confirm that upgrading Windows 10 does resolve this problem.

remoteBranch avatar May 29 '21 09:05 remoteBranch

Same issue here. Recompiling ZeroMQ with ZMQ_HAVE_IPC undefined works for me. But that's at best an undesirable workaround.

edwinvp avatar Sep 22 '21 17:09 edwinvp

I have encountered a similar issue on Windows Server 2019. libzmq version: 4.3.5 test_xpub_welcome_msg abort

Through debugging, I found that the issue is caused by the failure to create a temporary file, and there is no validation for the return value of create_ipc_wildcard_address, resulting in a connect failure. create_failed connect_failed

If create_ipc_wildcard_address fails, can we consider using make_fdpair_tcpip instead?

hjyheb avatar Dec 19 '23 04:12 hjyheb