cppzmq icon indicating copy to clipboard operation
cppzmq copied to clipboard

context and socket shutdown hangs

Open ovanes opened this issue 6 years ago • 8 comments

Currently I see ZeroMQ/cppzmq/libzmq hanging, after I try to exit the process.

I am using libzmq 4.2.2 on Mac OS X Sierra (10.12.6) and thought that this behaviour reflects bug described here: https://github.com/zeromq/libzmq/issues/1279, wich is considered to be resolved.

I post it here because I use cppzmq bindings, but I see cppzmq to cause an abort in this line:

        inline bool recv (message_t *msg_, int flags_ = 0)
        {
>>>         int nbytes = zmq_msg_recv (&(msg_->msg), ptr, flags_);
            if (nbytes >= 0)
                return true;
            if (zmq_errno () == EAGAIN)
                return false;
            throw error_t ();
        }

This is my code to shutdown the context + socket(s):

int linger_value = 0;
void* native_socket = socket_;
auto result = 
  zmq_setsockopt
    (native_socket, ZMQ_LINGER, &linger_value, sizeof(linger_value))
;
assert(0==result);
socket_.close();

context_ptr_->close();

Closing the socket here, causes the process to abort with SIGABRT. When not calling close() on the socket the process just hangs forever.

void* native_context = *context_ptr_;
auto result = zmq_ctx_set(native_context, ZMQ_BLOCKY, 0);

I am also not able to create non-blocking context, whenever the I call zmq_ctx_set(native_context, ZMQ_BLOCKY, 0); I receive -1 as return value.

ovanes avatar Jul 31 '17 20:07 ovanes

I am having the same issue. Any work around?

CptanPanic avatar Nov 09 '17 12:11 CptanPanic

What I've found out, and that flow wasn't documented, is that one needs to unbind the endpoints from the socket first.

I ended up writing a socket_wrapper class having the following functionality:

    struct socket_wrapper : boost::noncopyable
    {
        void unbind_all() noexcept
        {
          for(auto const& endpoint : bound_endpoints_)
          {
            try
            {
              socket_.unbind(endpoint);
            }
            catch(...)
            {
              // seems like ZMQ can't unbind from all bound endpoints but only from the first or last one
              WARNING_AC
                << "ZeroMQ seems to have a bug. It can't unbind an endpoint"
                   " from the socket which it previously bound to: '"
                << endpoint << "'"
              ;
            }
          }

          bound_endpoints_.clear();
        }

        void close()noexcept
        {
          auto handle = native_socket();

          if(!handle) return; // already closed

          TRACE_AC << "ZMQ Shutdown setting socketopt ZMQ_LINGER=0";
          int linger_value = 0;
          if(0!=zmq_setsockopt( handle
                              , ZMQ_LINGER
                              , &linger_value
                              , sizeof(linger_value)
                              )
            )
            WARNING_AC << "ZMQ Shutdown failed to set ZMQ_LINGER=0 option "
                          "for the socket."
          ;

          unbind_all();

          try
          {
            TRACE_AC << "calling ZMQ socket_t::close()";
            socket_.close();
          }
          catch(...)
          {
            ERROR_AC << "ZMQ socket close failed, might be a bug in ZMQ";
          }
        }

      private:
        zmq::socket_t socket_;
        std::vector<std::string> bound_endpoints_;
   }

And finally to close everything including context I call the shutdown function (it should give you the idea of the flow...):

    // from a class which manages the context and all the sockets...
      void shutdown()
      {
        if(interrupted_) return;
        interrupted_.store(true, std::memory_order_acquire);

        for(auto& socket_wrapper : sockets_)
            socket_wrapper.close();

        auto native_context = static_cast<void*>(context);
        if(0!=zmq_ctx_set(native_context, ZMQ_BLOCKY, 0))
        {
          // Failing assertion as ZMQ_BLOCKY does not seem to be supported
          // assert(0==result && "unable to set the ZMQ_BLOCKY to false");
          WARNING_AC
            << "ZMQ Shutdown: ZMQ_BLOCKY was not set, seems to be a bug in ZMQ"
          ;
        }

        TRACE_AC << "closing the context";
        context_ptr_->close();
      }

ovanes avatar Nov 09 '17 16:11 ovanes

Please provide a minimal example that reproduces the problem.

Your assumption that it is necessary to unbind sockets is not correct. However, you need to close all sockets. Context termination will block until all sockets have been closed.

sigiesec avatar Apr 04 '18 10:04 sigiesec

@ovanes could you confirm if you still see this issue on osx, please?

kurdybacha avatar Jun 04 '18 17:06 kurdybacha

Please provide a minimal example that reproduces the problem.

Hello! Does this sample has same kind of hang?

#include <string>
#include <zmq.hpp>

int main()
{
    const std::string text("hello");
    zmq::context_t context;
    zmq::socket_t socket(context, ZMQ_PUSH);
    zmq::message_t message(text.data(), text.size());
    socket.connect("tcp://localhost:6666");
    socket.setsockopt(ZMQ_SNDTIMEO, 100);
    socket.send(message, zmq::send_flags::dontwait);

    return 0;
}

mrfeod avatar Aug 29 '19 13:08 mrfeod

Please provide a minimal example that reproduces the problem.

Hello! Does this sample has same kind of hang?

#include <string>
#include <zmq.hpp>

int main()
{
    const std::string text("hello");
    zmq::context_t context;
    zmq::socket_t socket(context, ZMQ_PUSH);
    zmq::message_t message(text.data(), text.size());
    socket.connect("tcp://localhost:6666");
    socket.setsockopt(ZMQ_SNDTIMEO, 100);
    socket.send(message, zmq::send_flags::dontwait);

    return 0;
}

Yes, above code sample has the same hang with backtrace like this: (gdb) bt #0 0x00007ffff6efa7e1 in poll () from /lib64/libc.so.6 #1 0x00007ffff7b8093d in zmq::signaler_t::wait(int) () from /opt/phoenix/lib64/libzmq.so.5 #2 0x00007ffff7b6789c in zmq::mailbox_t::recv(zmq::command_t*, int) () from /opt/phoenix/lib64/libzmq.so.5 #3 0x00007ffff7b59c61 in zmq::ctx_t::terminate() () from /opt/phoenix/lib64/libzmq.so.5 #4 0x00007ffff7b9b93a in zmq_ctx_term () from /opt/phoenix/lib64/libzmq.so.5 #5 0x00000000004014d1 in zmq::context_t::close (this=0x7fffffffdf28) at cppzmq/zmq.hpp:670 #6 0x00000000004014a6 in zmq::context_t::~context_t (this=0x7fffffffdf28, __in_chrg=) at cppzmq/zmq.hpp:661 #7 0x000000000040120a in main () at test1.cpp:7

ywangtht avatar Jan 10 '20 22:01 ywangtht

If it provides any context (forgive the pun), if I call context.close() in a class destructor where the context is a member it also hangs, despite if I call the same shutdown sequence manually before the class that the context is resident in is destroyed working fine.

I guarantee all sockets are closed because they are the last actions out of all the work threads and I join all work threads before closing context. All threads join and then context close is called. If I call this stop function before the object goes out of scope and the destructor is called it works fine. If I let the destructor call then it hangs.

EDIT

I think in my situation it is because this is being loaded as a shared library and unloaded, and the unloading application is terminating threads aggressively in a way that causes the context to not shut down appropriately since it is waiting on the reaper which has been killed by the parent application.

NouberNou avatar Nov 13 '21 01:11 NouberNou

Please provide a minimal example that reproduces the problem.

Hello! Does this sample has same kind of hang?

#include <string>
#include <zmq.hpp>

int main()
{
    const std::string text("hello");
    zmq::context_t context;
    zmq::socket_t socket(context, ZMQ_PUSH);
    zmq::message_t message(text.data(), text.size());
    socket.connect("tcp://localhost:6666");
    socket.setsockopt(ZMQ_SNDTIMEO, 100);
    socket.send(message, zmq::send_flags::dontwait);

    return 0;
}

Yes, above code sample has the same hang with backtrace like this: (gdb) bt #0 0x00007ffff6efa7e1 in poll () from /lib64/libc.so.6 #1 0x00007ffff7b8093d in zmq::signaler_t::wait(int) () from /opt/phoenix/lib64/libzmq.so.5 #2 0x00007ffff7b6789c in zmq::mailbox_t::recv(zmq::command_t*, int) () from /opt/phoenix/lib64/libzmq.so.5 #3 0x00007ffff7b59c61 in zmq::ctx_t::terminate() () from /opt/phoenix/lib64/libzmq.so.5 #4 0x00007ffff7b9b93a in zmq_ctx_term () from /opt/phoenix/lib64/libzmq.so.5 #5 0x00000000004014d1 in zmq::context_t::close (this=0x7fffffffdf28) at cppzmq/zmq.hpp:670 #6 0x00000000004014a6 in zmq::context_t::~context_t (this=0x7fffffffdf28, __in_chrg=) at cppzmq/zmq.hpp:661 #7 0x000000000040120a in main () at test1.cpp:7

I am getting this same issue in one module of freeswitch mod_event_zmq

tanish20j avatar Jan 24 '23 09:01 tanish20j