ros_gz icon indicating copy to clipboard operation
ros_gz copied to clipboard

ROS -> IGN stops working after 5 minutes of inactivity

Open azeey opened this issue 4 years ago • 7 comments

I have noticed this problem in the SubT environment, but it can be easily reproduced as follows:

  • Start roscore

  • In one terminal (Terminal 1), run the parameter bridge:

    rosrun ros1_ign_bridge parameter_bridge "test@std_msgs/String]ignition.msgs.StringMsg"
    
  • In another terminal (Terminal 2), echo the ign-transport topic

    ign topic -e -t /test
    
  • In another terminal (Terminal 3), publish on the ROS topic

    rostopic pub -1 /test std_msgs/String "data: 'Hello1'"
    
  • On Terminal 2, you should see

    data: "Hello1"
    
  • Now wait 5 minutes without publishing anything and then, on Terminal 3, run

    rostopic pub -1 /test std_msgs/String "data: 'Hello2'"
    
  • You expect to see

    data: "Hello2"
    

    but instead, you'll get nothing.

This is a strange behavior since neither ROS or ign-transport have this problem when used directly to send messages to other ros and ign-transport nodes respectively.

azeey avatar Aug 27 '19 18:08 azeey

The issue seems to be related to having docker installed/running. Others may not experience this problem. We can put this issue on hold for now.

azeey avatar Aug 27 '19 23:08 azeey

I just tested this. I don't have docker running so I was still able to get the msg Hello2 after 5 mins

iche033 avatar Aug 27 '19 23:08 iche033

I just tested this inside a docker container and the 2nd message went through even after 10 minutes.

chapulina avatar Aug 28 '19 16:08 chapulina

Okay. This is really strange. I tested it inside the latest osrf/subt-virtual-testbed yesterday and I was still getting the problem. At this point, I'm inclined to think it must be something wrong with my machine/network setup.

azeey avatar Aug 28 '19 16:08 azeey

For the record, Addisu and I tested a couple of scenarios:

  • I ran it on my desktop inside a Docker container (Docker version 18.09) and I wasn't able to reproduce the problem.
  • I ran it on my laptop in the host I wasn't able to reproduce the problem.
  • In both my laptop and in Addisu's computer, Ignition Transport binds to 172.17.0.1, which belongs to the Docker network interface. You can check this by running the bridge or the listener with IGN_VERBOSE=1.
  • If Addisu forces Ignition Transport to bind to another network interface (e.g.: IGN_IP=127.0.0.1) the problem disappears.
  • It doesn't make any difference if the computer is connected to a LAN or isolated.
  • It doesn't make any difference if you run another listener after the 5-6 minute mark. The first listener still misses the message (in Addisu's computer).

caguero avatar Aug 28 '19 16:08 caguero

Thanks for summarizing our tests @caguero

This is a capture from wireshark only showing the ign-transport side of things. It shows that the first message was sent from the ip 172.17.0.1 and receives an ACK. The second message after 305 seconds is sent from 192.168.1.74, but it doesn't get an ACK. I'm not sure why the second message is sending from a different IP.

image

azeey avatar Aug 28 '19 17:08 azeey

This might be a related issue: https://github.com/zeromq/libzmq/issues/2763 Based on the comments there, I tried setting ZMQ_HEARTBEAT_IVL to 30 seconds and that seems to fix the problem for me. Again, since I'm the only one experiencing it and since I don't know exactly what's causing the problem, we can wait on making any changes. This is my diff on ign-transport/src/NodeShared.cc, fyi

diff --git a/src/NodeShared.cc b/src/NodeShared.cc
--- a/src/NodeShared.cc
+++ b/src/NodeShared.cc
@@ -939,17 +939,23 @@ void NodeShared::OnNewConnection(const M
   {
     try
     {
       // Handle security
       this->dataPtr->SecurityOnNewConnection();
 
       // I am not connected to the process.
       if (!this->connections.HasPublisher(addr))
+      {
+        // Heartbeat every 30 seconds
+        int heartBeatVal = 30000;
+        this->dataPtr->subscriber->setsockopt(ZMQ_HEARTBEAT_IVL,
+            &heartBeatVal, sizeof(heartBeatVal));
         this->dataPtr->subscriber->connect(addr.c_str());
+      }
 
       // Add a new filter for the topic.
       this->dataPtr->subscriber->setsockopt(ZMQ_SUBSCRIBE,
           topic.data(), topic.size());
 
       // Register the new connection with the publisher.
       this->connections.AddPublisher(_pub);
 

azeey avatar Aug 28 '19 20:08 azeey