ros2 topics and services are missing using Discovery Server
Bug report
Required Info:
- Operating System:
- Ubuntu 18.04 with Ubuntu 20.04 container - over Nvidia Jetson
- Installation type:
- Source
- Version or commit hash:
- ROS2 Galactic
- DDS implementation:
- rmw_fastrtps_cpp
Steps to reproduce issue
When running using a discovery server, and trying to get the topic list ''' FASTRTPS_DEFAULT_PROFILES_FILE=./super_client_configuration_file.xml ros2 topic list --no-daemon '''
Topics are not always visible.
This occurs quite often, it also occurs when new nodes are run. I am not sure if the issues are related, but it seems to be 2 instances of the same problem.
When using Wireshark. I can see a very high number of ICMP packets which report Destination unreachable.
Any suggestions?
@eranroll can you share the configuration file that can reproduce the issue?
<?xml version="1.0" encoding="UTF-8" ?> <dds> <profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles"> <participant profile_name="super_client_profile" is_default_profile="true"> <rtps> <builtin> <discovery_config> <discoveryProtocol>SUPER_CLIENT</discoveryProtocol> <discoveryServersList> <RemoteServer prefix="44.53.00.5f.45.50.52.4f.53.49.4d.41"> <metatrafficUnicastLocatorList> <locator> <udpv4> <address>127.0.0.1</address> <port>11810</port> </udpv4> </locator> </metatrafficUnicastLocatorList> </RemoteServer> </discoveryServersList> </discovery_config> </builtin> </rtps> </participant> </profiles> </dds>
I think some extra information may be useful:
- How do you run the container?
- Which command do you use to run the server?
- Are you trying to communicate the container and the host, or is it all communication expected to happen within the container itself?
- How do the network interfaces look like in both host and container?
- Could you attach some Wireshark capture with the ICMP packets?
Thanks in advance!
Hi Eduardo,
The container is run the following way:
/usr/bin/docker run \ --rm \ --name %n \ --init \ --net=host \ --pid=host \ --ipc=host \ -v /dev/shm:/dev/shm \ --env-file /etc/xtra/xtra_gcs.conf \ -v /opt/missioncontroller_mark3:/ws \ xtendreality/ros:galactic-multiarch-runtime-latest \ ros2 launch xtra_device_engine xtra_device_engine.launch.py
I am running the container using a systemd service, it runs as root.
The discovery server runs using a systemd service as well, as root:
/usr/bin/docker run
--rm
--name %p-%i
--init
--net=host
--pid=host
--ipc=host
-v /dev/shm:/dev/shm
--env-file /etc/xtra/xtra_gcs.conf
-v /opt/missioncontroller_mark3:/ws
xtendreality/ros:galactic-multiarch-runtime-latest
fastdds discovery -i %i -p 1181%I
All nodes are using a container to run, but it isn't inside the same container.
The network interfaces are the inside the container and on the host.
See wireshark recording attached, when running
FASTRTPS_DEFAULT_PROFILES_FILE=./super_client_configuration_file.xml ros2 topic list --no-daemon
Wireshark recording: https://drive.google.com/file/d/14QZQNk53iwEAIuDO9O5d4XeLRR04rCT-/view?usp=sharing
Hi @eranroll,
We have been looking at the information you have provided and there are some inconsistencies. It seems that you are launching the server using eProsima Fast DDS CLI: fastdds discovery -i [ID] -p [port]. However, these parameters depend on the instance service being run (%i). Bear in mind that the clients, superclients and other servers must have in its remote list both the guidPrefix and the metatraffic unicast locator defined so the discovery could happen. These participants would ping the server known locator until the server answers.
What we are looking in your traffic capture is that some participants are pinging the port 11811 where no server is listening (that is the reason for the ICMP packets). It seems that only the server in the instance %0 with guidPrefix 44.53.00.5f.45.50.52.4f.53.49.4d.41 and listening on port 11810 is running. The SUPER_CLIENT XML provided is consistent with this information, but if the instance changes, the discovery will not longer happen because the port and guidPrefix would be different from the expected.
Also, the system description is not complete seeing the traffic capture. There are also clients being launched and trying to communicate with servers both at port 11810 as in port 11811 (this last one not being available).
Summing up, it seems that your system is not well configured. I hope this information helps you and it could be the reason behind the topic discovery issue. We can try to help more if you send us an easily reproducible environment but for the moment, we do not have enough information.
Hi,
Thank you for all the help. You are right, we do have 2 discovery server running. 11810 is the port for nodes which communicate only locally, whereas 11811 is used for remote nodes to communicate with the main sub system.
Nodes which require both remote and local communication have both addresses set in the ROS_DISCOVERY_SERVER.
Do you still think this setup might not work properly?
What other info will you need?
Hi @eranroll,
There is no problem using several servers, clients and superclients if you have clear which has to connect with which one and you configure them properly. Seeing the information you have attached, it seems that the server listening at 11811 is not launched or not properly configured. Are you using a different guidPrefix? Are you setting this guidPrefix in the remote server list of the clients/superclients that are supposed to connect with it?
In order to be able to reproduce your issue at our side we would need a complete description of your topology (nodes that are running, if they are clients, servers, superclients) and your configuration files (XML or code where you set up the discovery server configuration).
Hi,
We do have another discovery server running on a different port, but I believe it wasn't running at the time of the recording. This isn't the problem, I just wanted to have less processes running for the test. The issue is reproducible with or without the discovery server on 111811 running.
I will capture new recordings and upload it as soon as I can.