PX4-SITL_gazebo-classic
PX4-SITL_gazebo-classic copied to clipboard
ROS2 micrortps_bridge is not scaling to additional vehicles in Gazebo
I am trying to control multiple vehicles in Gazebo with a set of ROS2 nodes. Each vehicle is controlled by a set of six ROS2 nodes. I have followed the multi vehicle simulation with gazebo tutorial and can launch multiple iris drones. I have also followed the RTPS/DDS Interface: PX4-Fast RTPS(DDS) Bridge tutorial and can control a few (i.e., 1-3) iris drones through the micrortps_bridge
with my ROS2 nodes.
However, when I try to scale to additional drones, I begin to run into errors. I will describe (a) my setup, (b) the errors I am experiencing, (c) what I have tried to do fix those errors, and (d) the results from some interesting experiments that lead me to believe there is an issue with the micrortps_bridge
.
Setup
In Terminal 1, I start the PX4-Gazebo simulator:
./gazebo_sitl_multiple_run.sh -n 3 -t px4_sitl_rtps
In Terminal 2, I start the micrortps_agent
s:
micrortps_agent -t UDP -r 2020 -s 2019 -n vehicle1 -v &
micrortps_agent -t UDP -r 2022 -s 2021 -n vehicle2 -v &
micrortps_agent -t UDP -r 2024 -s 2023 -n vehicle3 -v &
In Terminals 3-5, I run a launch file. This launch file starts six ROS2 nodes that collectively control an individual drone:
AGENT_NAMESPACE=vehicle1 ros2 launch start_system.py
AGENT_NAMESPACE=vehicle2 ros2 launch start_system.py
AGENT_NAMESPACE=vehicle3 ros2 launch start_system.py
The ROS2 nodes communicate to the vehicles through the micrortps_bridge
and everything works fine.
Errors
When I repeat the above steps, but modified to handle 4 drones, Terminal 2 begins to print messages like this:
[ micrortps_agent ] VehicleOdometry publisher matched
[ micrortps__timesync ] RTTI too high for timesync: 103ms
[ micrortps__timesync ] Offset not updated
[ micrortps__timesync ] Timesync offset outlier, discarding
[ micrortps__timesync ] Offset not updated
[ micrortps__timesync ] Timesync offset outlier, discarding
[ micrortps__timesync ] Offset not updated
Terminal 3-5 still run correctly. But Terminal 6 (for drone 4) will print a warning from one of my ROS2 nodes (i.e., swarm_comms
) that one of its services (/swarm_comms/swarm_status
) is "not available".
Attempted fixes
In urtps_bridge_topics.yaml, there are on the order of 27 messages passed over the micrortps_bridge
. If I reduce that to only 10 messages (i.e., the messages my ROS2 nodes use + the timesync
and timesync_status
messages), and recompile the PX4-Autopilot repo as well as my repo with the px4_msgs
and px4_ros_com
submodules, I am able to partially remedy the problem described above. Instead of only getting three drones to work, I am now able to get four drones to work.
If I repeat the steps above, but modified to support five drones, I get the same errors described above. This leads me to believe that there is a relationship between the micrortps_bridge
throughput and the number of vehicles supported.
I have gone into the px4_ros_com
code and modified the limits/logic associated with the "RTTI too high for timesync" and "timesync offset outlier" errors. Instead of an rtti > 50 ms , I changed it to an rtti > 500 ms. I also increased the WINDOW_SIZE
from 500 to 5000. I don't get the errors on Terminal 2, but I still get the "/swarm_comms/swarm_status service not available" error.
Experiments
To rule out my computer resources being overwhelmed by the number of ROS2 nodes and ROS2 messages being passed around, I checked gnome-system-monitor
. CPU utilization, Memory and Network history all showed low values.
To rule out my the swarm_comms
ROS2 node being unprepared for /swarm_comms/swarm_status
service calls, I added a delay in all other ROS2 nodes to give swarm_comms
time to start up before service calls were made. No change in behavior.
To rule out the /swarm_comms/swarm_status
service call from being available, I create a new callback group specifically for that service, but this did not change the behavior:
service_callback_group = MutuallyExclusiveCallbackGroup()
self.create_service(Status, 'swarm_comms/swarm_status', self.get_swarm_status, callback_group=service_callback_group)
To rule out ROS2 having some upper limit on the number of ROS2 rosnodes running simultaneously on the same machine (or something else networking-related), I ran the following experiment:
- On Terminal 1, I started up the sim with 7 vehicles
- On Terminal 2, I did nothing. I did NOT start the micrortps_agents
- On Terminals 3-9, I ran the launch file that starts up an instance of the six ROS2 nodes for each of the 7 drones.
Everything worked fine. There were no errors. The /swarm_comms/swarm_status
service call worked properly. ROS2 nodes are able to exchange information intra-vehicle, as well as inter-vehicle, as expected.
To drill into the root cause of the issue, I repeated the steps in the previous paragraph, but started the micrortps_agents on Terminal 2. As expected, Terminal 2 printed the same errors as above: "RTTI too high for timesync" and "Timesync offset outlier". When I started Terminal 3 (for drone 1), it ran correctly. When I started Terminal 4 (for drone 2), it printed the "/swarm_comms/swarm_status service not available" error.
Earlier, when I ran these steps (but which were modified for 3 drones in sim + 3 micrortps_agent
s), everything worked properly. Now, however, with 7 drones in sim + 7 micrortps_agent
s, it is only Terminal 3 (for drone 1) that works properly. Terminal 4 (for drone 2) produces the "/swarm_comms/swarm_status service not available" error. This leads me to believe that there is a relationship between the micrortps_bridge
throughput and the number of vehicles supported.
To further drill into this issue, I modified my launch file (i.e., start_system.py
) to only start two rosnodes: swarm_comms
+ sle
(which calls the swarm_comms/swarm_status
service) instead of the normal 6 rosnodes. In Terminal 1, I started 7 drones in sim. In Terminal 2, I started 7 micrortps_agents. Terminals 3-6 (for drones 1-4) ran successfully. Terminals 7-9 (for drones 5-7) produced the “swarm_comms/swarm_status service not available” error.
I presume that since only two of my ROS2 nodes are running (instead of all six), they are only requesting/sending a subset of the 10 uorb messages specified in urtps_bridge_topics.yaml
. This would result in less data being sent over the micrtortps_bridge
. Since this experiment allow for more drones to work, this further leads me to believe that there is a relationship between the micrortps_bridge
throughput and things working properly.
The results from my experiments point to there being a problem with the micrortps_bridge supporting multiple vehicles. I am not sure whether it is the micrortps_client
, micrortps_agent
or something else.
Anything that can be shared to help me understand how to control 7+ vehicles over the micrortps_bridge
with ROS2 would be very appreciated!
same issue
Hello, have you solved this problem? I also encountered a similar problem.