libzmq icon indicating copy to clipboard operation
libzmq copied to clipboard

Question about ZMQ_HEARTBEAT_* options

Open EgalYue opened this issue 4 years ago • 3 comments

Issue description

In my case, the subscriber can be waiting for the publisher for a long time(maybe couple of minutes or 30min), during this period the publisher can be not online. After a while the publisher started to publish message, but i found that the subscriber was still "hang" or dead, it received nothing.

I keep following values as default.

tcp_keepalive_time 7200
tcp_keepalive_intvl 75
tcp_keepalive_probes 9

My previous solution was using zmq_poll timeout and then reconnect the socket. But I found this method is not elegant. Because of some reason my subscriber part must use zmq_bind but not zmq_connect, in this case, i found that after a while the port can not "bind" any more, it shows, "[ZMQ Subscriber]: Bind failed reason: Address already in use"

Q1: Why the socket can not "bind" any more? Even though i have already set LINGER and set sleep time before reconnect? it sounds like a problem of the underlying system?

Pseudo code:

int reconnect(){
    zmq_close(m_receiver);
 
    // create the socket again
    int linger = 0;
    m_receiver = zmq_socket(m_context, ZMQ_SUB);
    zmq_setsockopt (m_receiver, ZMQ_LINGER, &linger, sizeof(linger)); // important!!!
    if (zmq_bind(m_receiver, m_endpoint.c_str()) < 0){  // This is zmq_bind not zmq_connect
        return -1;
    }
    return 1;
}

int subscribe(){
    ...
    int poll_flag = zmq_poll (items, 1, timeout);
    if (0 == poll_flag){ // timeout reconnect the socket !
        reconnect()
        sleep(200ms)
    }
    ...

}


I already read issues https://github.com/zeromq/libzmq/issues/2763, https://github.com/ignitionrobotics/ros_ign/issues/42, https://github.com/zeromq/pyzmq/issues/1503

So im curious about ZMQ_HEARTBEAT_* options.

Q2: If ZMQ_HEARTBEAT_TIMEOUT is set e.g. 5s, the connection would be closed after 5s? or the connection would be Re-established automatically, which means ZMQ_HEARTBEAT_* options would keep socket alive all the time? If the connection is closed after 5s, i need to reconnect the socket right? (which means close socket firstly and then bind the socket again)

Environment

libzmq version (commit hash if unreleased): 4.3 OS: Docker in ubuntu18.04 or RK3399 chip (linaro)

EgalYue avatar Apr 07 '21 05:04 EgalYue

Q3: What is the difference between zmq tcp keepalive options and heartbeat options? just TCP layer and ZMQ protocol layer?

E.g. If I use following values, the connection would be reconnected after 1min automatically, right?

int v1 = 1;
int v2 = 9;
int v3 = 60;
int v4 = 1;
zmq_setsockopt(socket, ZMQ_TCP_KEEPALIVE, &v1, sizeof(v1));
zmq_setsockopt(socket, ZMQ_TCP_KEEPALIVE_CNT, &v2, sizeof(v2));
zmq_setsockopt(socket, ZMQ_TCP_KEEPALIVE_IDLE, &v3, sizeof(v3));
zmq_setsockopt(socket, ZMQ_TCP_KEEPALIVE_INTVL, &v4, sizeof(v4));

EgalYue avatar Apr 07 '21 10:04 EgalYue

+1, very confused about these options.

umialpha avatar Jan 19 '22 08:01 umialpha

Did you solve it? @EgalYue

gorghino avatar Apr 25 '22 10:04 gorghino