ROS_LOCALHOST_ONLY is not preventing cross talking between machines
Bug report
Required Info:
- Operating System: Ubuntu 20.04
- Installation type: binaries
- DDS implementation: Cyclone (default)
- Rolling
Steps to reproduce issue
Connect you machine to a network with multiple other machines running ROS2
ros2 node list
Expected behavior
With export ROS_LOCALHOST_ONLY=1, no nodes should be listed if nothing runs on your machine
Actual behavior
I get multiple node listed, several with the exact same name
Additional information
If using export ROS_DOMAIN_ID='unique_id_on_the_netwok', no nodes are listed.
If switching off my wifi interface, no nodes are listed.
If connecting to another wifi network (with no other ROS2 machines), no nodes are listed.
I would expect to be isolated in the same way and not seeing any difference between using ROS_LOCALHOST_ONLY=1 and usingROS_DOMAIN_ID='unique_id_on_the_netwok'`
i think ros2 daemon already running and caches endpoints via discovery. and ros2 node list will ask for node list to daemon via xmlrpc, then daemon returns cached list.
### Host-A
# just in case, clear that out.
root@f8a93cb8cfbd:~# unset ROS_LOCALHOST_ONLY
# start publisher
root@f8a93cb8cfbd:~# ros2 run demo_nodes_cpp talker
[INFO] [1619075202.150248497] [talker]: Publishing: 'Hello World: 1'
...
### Host-B
# set env variable only to have localhost network
root@24c44a10b658:~# export ROS_LOCALHOST_ONLY=1
# print node list
root@24c44a10b658:~# ros2 node list
/talker ---------------> problem confirmed.
# restart daemon to clear discovery cache.
root@24c44a10b658:~# ros2 daemon stop
The daemon has been stopped
root@24c44a10b658:~# ros2 daemon start
The daemon has been started
# list node again
root@24c44a10b658:~# ros2 node list
@doisyg could you check my previous comment? and if you still have problem, let us know 😃
Hi @fujitatomoya, So when I noticed the issue, I had no control on host/hosts A (but was on the same network), so I don't know exactly what was running on them. The machine I controlled was host B and yes, even after stopping manually the daemon (or rebooted), the problem persisted.
I cannot reproduce it with 2 machines that I can control and the talker example. I know it is a fuzzy report, but I am almost certain that there is an issue as I noticed it a couple of times. And it disturbs our setup enough that we all resorted using ROS_DOMAIN_ID. What else can I run the next time I notice the issue (I have to be in a shared office with other ros2 devs) ? Is there any way of knowing from which ip the "phantom nodes" are coming ?
even after stopping manually the daemon (or rebooted), the problem persisted.
okay...
I know it is a fuzzy report, but I am almost certain that there is an issue as I noticed it a couple of times.
i am not saying this is no problem, and this sometimes happens... 😢 but w/o reproducible procedure, it would be really hard to debug.
Is there any way of knowing from which ip the "phantom nodes" are coming ?
i think getting IP address requires the debug information from dds (rmw implementation, cyclone or fastdds), which i am not sure how to do that...
could be related to https://github.com/ros2/rmw_cyclonedds/issues/311
could be related to ros2/rmw_cyclonedds#311
I doubt it — that would more likely than not cause it to not work at all.
i think getting IP address requires the debug information from dds (rmw implementation, cyclone or fastdds), which i am not sure how to do that...
For Cyclone, the quickest route is usually still to enable (discovery) tracing: if you set the CYCLONEDDS_URI environment variable to an XML file containing
<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
<Domain id="any">
<Tracing>
<Verbosity>fine</Verbosity>
<OutputFile>cdds.log.${CYCLONEDDS_PID}</OutputFile>
</Tracing>
</Domain>
</CycloneDDS>
you'll get a text file with tons of details. Do look for lines matching the regex SPDP.*NEW; when in doubt, I'll be happy to help. (CYCLONEDDS_URI is really a comma-separated list of files, URIs (only file:// for now) and configuration fragments, so if you already have a file, you can edit it or you can add another one; or, if you are like me and lazy, you can copy-paste an abbreviated form into CYCLONEDDS_URI directly: <Tr><V>fine</><Out>cdds.log.${CYCLONEDDS_PID}</> in the environment variable will do exactly the same).
The API makes discovery information available via built-in topics, and I do intend to add IP addresses to that data. Especially now that it has become really accessible (e.g. https://github.com/eclipse-cyclonedds/cyclonedds-python/blob/master/src/cyclonedds/tools/ddsls.py) that is the way to do these things. For now, however, the traces are best (or wireshark, I suppose).
P.S. ROS_LOCALHOST_ONLY causes it to use the loopback interface, that is:
- it advertises only loopback addresses
- it sets the multicast transmit interface to loopback
- it joins the multicast group only on the loopback interface
One would expect that this would keep it isolated from the rest of the network, but if nonetheless receives a participant discovery packet (i.e., the message that bootstraps the DDS discovery mechanism) from another machine, then things become a little tricky because the version of "6 days ago" (as GitHub so helpfully pretty-prints the date) will not discard it.
I think re-trying it with the latest fixes that (though I would wait for https://github.com/eclipse-cyclonedds/cyclonedds/pull/774 to be merged, which should be real soon), but I haven't specifically tried it.
Thanks @eboasson for the detailed answer. I will then wait for the lastest fixes and report here
One would expect that this would keep it isolated from the rest of the network, but if nonetheless receives a participant discovery packet (i.e., the message that bootstraps the DDS discovery mechanism) from another machine, then things become a little tricky because the version of "6 days ago" (as GitHub so helpfully pretty-prints the date) will not discard it.
That would explain what I am seeing
We have also experienced the same thing on ROS 2 humble.
The best way for us to stop communication was to:
- set up
cyclonedds.xmlwith:<Interfaces> <NetworkInterface name="lo" priority="default" multicast="default" /> </Interfaces> - remove the
export ROS_LOCALHOST_ONLY=1line from.bashrc - stop the existing ROS 2 daemon:
ros2 daemon stop - enable multicast for loopback interface:
sudo ip link set lo multicast on
Also in order to check the if you are part of a most likely DDS activity, you can use Wireshark by applying the rtps as a filter.
In large office networks, this can be useful. In our office, once everyone has applied these steps, the rtps traffic has reduced to zero for us.