rmw_cyclonedds icon indicating copy to clipboard operation
rmw_cyclonedds copied to clipboard

ROS_LOCALHOST_ONLY is not preventing cross talking between machines

Open doisyg opened this issue 5 years ago • 9 comments

Bug report

Required Info:

  • Operating System: Ubuntu 20.04
  • Installation type: binaries
  • DDS implementation: Cyclone (default)
  • Rolling

Steps to reproduce issue

Connect you machine to a network with multiple other machines running ROS2

ros2 node list

Expected behavior

With export ROS_LOCALHOST_ONLY=1, no nodes should be listed if nothing runs on your machine

Actual behavior

I get multiple node listed, several with the exact same name

Additional information

If using export ROS_DOMAIN_ID='unique_id_on_the_netwok', no nodes are listed. If switching off my wifi interface, no nodes are listed. If connecting to another wifi network (with no other ROS2 machines), no nodes are listed.

I would expect to be isolated in the same way and not seeing any difference between using ROS_LOCALHOST_ONLY=1 and usingROS_DOMAIN_ID='unique_id_on_the_netwok'`

doisyg avatar Apr 20 '21 13:04 doisyg

i think ros2 daemon already running and caches endpoints via discovery. and ros2 node list will ask for node list to daemon via xmlrpc, then daemon returns cached list.

### Host-A
# just in case, clear that out.
root@f8a93cb8cfbd:~# unset ROS_LOCALHOST_ONLY
# start publisher
root@f8a93cb8cfbd:~# ros2 run demo_nodes_cpp talker
[INFO] [1619075202.150248497] [talker]: Publishing: 'Hello World: 1'
...

### Host-B
# set env variable only to have localhost network
root@24c44a10b658:~# export ROS_LOCALHOST_ONLY=1
# print node list
root@24c44a10b658:~# ros2 node list
/talker               ---------------> problem confirmed.
# restart daemon to clear discovery cache.
root@24c44a10b658:~# ros2 daemon stop
The daemon has been stopped
root@24c44a10b658:~# ros2 daemon start
The daemon has been started
# list node again
root@24c44a10b658:~# ros2 node list

fujitatomoya avatar Apr 22 '21 07:04 fujitatomoya

@doisyg could you check my previous comment? and if you still have problem, let us know 😃

fujitatomoya avatar Apr 24 '21 09:04 fujitatomoya

Hi @fujitatomoya, So when I noticed the issue, I had no control on host/hosts A (but was on the same network), so I don't know exactly what was running on them. The machine I controlled was host B and yes, even after stopping manually the daemon (or rebooted), the problem persisted.

I cannot reproduce it with 2 machines that I can control and the talker example. I know it is a fuzzy report, but I am almost certain that there is an issue as I noticed it a couple of times. And it disturbs our setup enough that we all resorted using ROS_DOMAIN_ID. What else can I run the next time I notice the issue (I have to be in a shared office with other ros2 devs) ? Is there any way of knowing from which ip the "phantom nodes" are coming ?

doisyg avatar Apr 24 '21 09:04 doisyg

even after stopping manually the daemon (or rebooted), the problem persisted.

okay...

I know it is a fuzzy report, but I am almost certain that there is an issue as I noticed it a couple of times.

i am not saying this is no problem, and this sometimes happens... 😢 but w/o reproducible procedure, it would be really hard to debug.

Is there any way of knowing from which ip the "phantom nodes" are coming ?

i think getting IP address requires the debug information from dds (rmw implementation, cyclone or fastdds), which i am not sure how to do that...

fujitatomoya avatar Apr 25 '21 23:04 fujitatomoya

could be related to https://github.com/ros2/rmw_cyclonedds/issues/311

fujitatomoya avatar Apr 26 '21 01:04 fujitatomoya

could be related to ros2/rmw_cyclonedds#311

I doubt it — that would more likely than not cause it to not work at all.

i think getting IP address requires the debug information from dds (rmw implementation, cyclone or fastdds), which i am not sure how to do that...

For Cyclone, the quickest route is usually still to enable (discovery) tracing: if you set the CYCLONEDDS_URI environment variable to an XML file containing

<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
    <Domain id="any">
        <Tracing>
            <Verbosity>fine</Verbosity>
            <OutputFile>cdds.log.${CYCLONEDDS_PID}</OutputFile>
        </Tracing>
    </Domain>
</CycloneDDS>

you'll get a text file with tons of details. Do look for lines matching the regex SPDP.*NEW; when in doubt, I'll be happy to help. (CYCLONEDDS_URI is really a comma-separated list of files, URIs (only file:// for now) and configuration fragments, so if you already have a file, you can edit it or you can add another one; or, if you are like me and lazy, you can copy-paste an abbreviated form into CYCLONEDDS_URI directly: <Tr><V>fine</><Out>cdds.log.${CYCLONEDDS_PID}</> in the environment variable will do exactly the same).

The API makes discovery information available via built-in topics, and I do intend to add IP addresses to that data. Especially now that it has become really accessible (e.g. https://github.com/eclipse-cyclonedds/cyclonedds-python/blob/master/src/cyclonedds/tools/ddsls.py) that is the way to do these things. For now, however, the traces are best (or wireshark, I suppose).

eboasson avatar Apr 26 '21 07:04 eboasson

P.S. ROS_LOCALHOST_ONLY causes it to use the loopback interface, that is:

  • it advertises only loopback addresses
  • it sets the multicast transmit interface to loopback
  • it joins the multicast group only on the loopback interface

One would expect that this would keep it isolated from the rest of the network, but if nonetheless receives a participant discovery packet (i.e., the message that bootstraps the DDS discovery mechanism) from another machine, then things become a little tricky because the version of "6 days ago" (as GitHub so helpfully pretty-prints the date) will not discard it.

I think re-trying it with the latest fixes that (though I would wait for https://github.com/eclipse-cyclonedds/cyclonedds/pull/774 to be merged, which should be real soon), but I haven't specifically tried it.

eboasson avatar Apr 26 '21 07:04 eboasson

Thanks @eboasson for the detailed answer. I will then wait for the lastest fixes and report here

One would expect that this would keep it isolated from the rest of the network, but if nonetheless receives a participant discovery packet (i.e., the message that bootstraps the DDS discovery mechanism) from another machine, then things become a little tricky because the version of "6 days ago" (as GitHub so helpfully pretty-prints the date) will not discard it.

That would explain what I am seeing

doisyg avatar Apr 26 '21 09:04 doisyg

We have also experienced the same thing on ROS 2 humble.

The best way for us to stop communication was to:

  • set up cyclonedds.xml with:
    <Interfaces>
        <NetworkInterface name="lo" priority="default" multicast="default" />
    </Interfaces>
    
  • remove the export ROS_LOCALHOST_ONLY=1 line from .bashrc
  • stop the existing ROS 2 daemon: ros2 daemon stop
  • enable multicast for loopback interface: sudo ip link set lo multicast on

Also in order to check the if you are part of a most likely DDS activity, you can use Wireshark by applying the rtps as a filter.

In large office networks, this can be useful. In our office, once everyone has applied these steps, the rtps traffic has reduced to zero for us.

xmfcx avatar Nov 17 '23 14:11 xmfcx