Assertion failure in epoll.cpp due to failing AF_UNIX bind() on Windows 10 1803
Issue description
When I run the libzmq test suite, many tests (such as test_reqrep_tcp.exe) abort with an assertion in epoll.cpp. I eventually tracked this down to a bind() call in ip.cpp failing during context creation, with the Winsock error set to WSAEINVAL.
The failing bind call originates at https://github.com/zeromq/libzmq/blob/1d2af8d38842427feba909b2f47275120d104ec8/src/ip.cpp#L564
This error exclusively occurs on Windows 10 1803, which is EOL for consumers and rapidly approaching EOL for businesses, so I don't expect this to be fixed - but if someone searches for it, at least they'll have something. I suspect something's wrong with Windows' (then newly added) AF_UNIX support in 1803, because I couldn't get their own example from https://devblogs.microsoft.com/commandline/windowswsl-interop-with-af_unix/ to work either.
Environment
- libzmq version (commit hash if unreleased): 4.3.3, also seen in 1d6b2329
- OS: Windows 10 1803 (clean VM, newly set up for this)
Minimal test code / Steps to reproduce the issue
Either:
- Run test_reqrep_tcp.exe or many other tests
Or:
- Create a socket.
#include <zmq.hpp> // not pure libzmq, I know, but I'm already putting more time into this than makes any sense
int main(int, char**)
{
zmq::context_t ctx;
zmq::socket_t sock(ctx, zmq::socket_type::server);
return 0;
}
What's the actual result? (include assertion message & call stack if applicable)
Assertion message from 1:
Z:\Debug>test_reqrep_tcp.exe
Bad file descriptor (C:\Code\libzmq\src\epoll.cpp:100)
Z:\Debug>echo %errorlevel%
1073741845
VS call stack from 2:
> SystemHealth.Qualification.exe!zmq::zmq_abort(const char * errmsg_) Line 84 C++
SystemHealth.Qualification.exe!zmq::epoll_t::add_fd(unsigned int fd_, zmq::i_poll_events * events_) Line 100 C++
SystemHealth.Qualification.exe!zmq::reaper_t::reaper_t(zmq::ctx_t * ctx_, unsigned int tid_) Line 50 C++
SystemHealth.Qualification.exe!zmq::ctx_t::start() Line 430 C++
SystemHealth.Qualification.exe!zmq::ctx_t::create_socket(int type_) Line 490 C++
SystemHealth.Qualification.exe!zmq_socket(void * ctx_, int type_) Line 262 C++
SystemHealth.Qualification.exe!zmq::socket_t::socket_t(zmq::context_t & context_, int type_) Line 1564 C++
SystemHealth.Qualification.exe!zmq::socket_t::socket_t(zmq::context_t & context_, zmq::socket_type type_) Line 1575 C++
What's the expected result?
Passing tests, or a socket gets created and doesn't fail an assertion.
I have seen this sort of stack trace, although google didn't send me here before I debugged it a lot. In my case, appears that the bind(AF_UNIX ...) calls in the signaler_t constructor of reaper_t/mailbox were failing. This snowballs down to the reaper_t constructor which gets a bad file handle from mailbox that it passes on and then finally gets handled by zmq_abort.
I suspect an old OS as well, I am also on 1803.
Also confirm that upgrading Windows 10 does resolve this problem.
Same issue here. Recompiling ZeroMQ with ZMQ_HAVE_IPC undefined works for me. But that's at best an undesirable workaround.
I have encountered a similar issue on Windows Server 2019.
libzmq version: 4.3.5
Through debugging, I found that the issue is caused by the failure to create a temporary file, and there is no validation for the return value of create_ipc_wildcard_address, resulting in a connect failure.
If create_ipc_wildcard_address fails, can we consider using make_fdpair_tcpip instead?