rmw_fastrtps icon indicating copy to clipboard operation
rmw_fastrtps copied to clipboard

Unpredictable behavior on machines that have multiple NICs

Open andrewbest-tri opened this issue 2 years ago • 10 comments

Bug report

Required Info:

  • Operating System: Windows and linux 20.04 (Foxy)

  • Installation type: Apt on linux, aka.ms on windows

  • Version or commit hash: Not sure, binary install.

  • DDS implementation: FASTRTPS

  • Client library (if applicable): rclcpp and rclpy

Steps to reproduce issue

I'll try to do my best to be concise: behavior for ros2 is unpredictable when attempting to use a fastrtps profiles file to control traffic. I can split to multiple issues if necessary. All these tests can be observed using the following test:


open three terminals.

Terminal 1: Set the environment variable for the profiles file. Play a ros bag

Terminal 2: Set the environment variable for the profiles file. ros2 topic list

Terminal 3 DO NOT set the profiles file ros2 topic list ros2 topic echo a topic from the bag


Symptom 1: Discovery server is sticky Terminal 3 in the above test will see all the topics listed in ros2 topic list. However, it will not see any messages from the bag. Without the profiles file... it gets no traffic. But, it doesn't know this so it plods along happily getting no data at all.

We're losing experiment data to this problem. We don't even realize traffic isn't appearing.

Symptom 2: There appears to be no way to limit traffic without affecting anything else We want to have EXACTLY the default transport behavior, except prevent the nodes from spamming themselves over both NICs. This doesn't seem possible. Here is our config (trying to be as close as we can to what we think is default... but there is no default XML).

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles" >
  <transport_descriptors>
    <transport_descriptor>
      <transport_id>udp_transport</transport_id>
      <type>UDPv4</type>
      <non_blocking_send>true</non_blocking_send>
      <interfaceWhiteList>
        <address>127.0.0.1</address> <!-- loopback -->
        <address>192.168.10.18</address>
        <address>192.168.10.20</address>
        <address>192.168.10.22</address>
      	<address>192.168.10.24</address> 
        <address>192.168.10.26</address>
        <address>172.17.0.1</address>  <!-- Docker -->
      </interfaceWhiteList>
    </transport_descriptor>
    <transport_descriptor>
        <transport_id>shm_transport</transport_id>
        <type>SHM</type>
    </transport_descriptor>
  </transport_descriptors>
  <participant profile_name="UDPParticipant" is_default_profile="true">
    <rtps>
      <name>profile_for_ros2_context</name>
      <userTransports>
        <transport_id>udp_transport</transport_id>
        <transport_id>shm_transport</transport_id>
      </userTransports>
      <useBuiltinTransports>false</useBuiltinTransports>
    </rtps>
  </participant>
</profiles>

Symptom 3: Not using FASTRTPS profile causes spam and invalid data. We use machines on a closed network to communicate with our robot. Some of those machines connect to the internet on a second interface. If we do not use fastrtps profiles, messages will send over both network cards... and late/duplicated messages will mess up our system performance. Ideally we could just say "Only use this NIC" and not need a complex configuration with transports.

Symptom 4: Multiple users on one machine cannot see one-anothers traffic If we run a process as a different user, even using the profiles file, we cannot see any traffic between users. This does not occur without a profiles file. It also does not seem affected by whether or not we enable shared memory transport... that was a recent debugging attempt.

Overall, this seems like a common usecase, but its pretty hard to nail down and largely undocumented.

andrewbest-tri avatar Mar 22 '22 23:03 andrewbest-tri