zenoh-plugin-ros2dds
zenoh-plugin-ros2dds copied to clipboard
[Bug] Topic subscription failure with multiple zenoh-bridge-ros2dds peers
Describe the bug
This may be related to this and this.
We have a setup that involves a central server connected to multiple robots, all running ROS2 iron in docker containers. The central server provides some topics to all robots. We also need some robot to robot ROS2 communication.
We initially used zenoh-bridge-ros2dds in peer mode at the server and all robots, but experienced non-obvious failures of data transmission on topics between server and robots.
A simplified setup that exhibits the problem is:
server: docker(talker)
server: docker(listener)
server: docker(zenoh-bridge-ros2dds -m peer -l tcp/0.0.0.0:7447)
robot1: docker(zenoh-bridge-ros2dds -m peer -l tcp/0.0.0.0:7447)
robot1: docker(listener)
robot2: docker(zenod-bridge-ros2dds -m peer -l tcp/0.0.0.0:7447)
robot2: docker(listener)
The server containers are started, then the zenoh containers on the robots. Robot1 listener is started, correctly shows received data, then stopped. Robot2 listener is started, may show data, then is stopped. Robot2 listener is started again, does not show any data.
If the robot zenoh containers are changed to clients, connecting to the server ip address, the failure does not occur.
If the listener on the server is not started, the failure seems to occur very rarely.
To reproduce
We have not managed to reproduce this with composed containers on a single host. Server is running Ubuntu 20.04, robot1 and robot2 are running Ubuntu 22.04. The robots are connected over WiFi. The container simonj23/dots_core:iron
is a ROS2 iron distribution with CycloneDDS installed.
Run in all cases with config files in the current directory.
cyclonedds.xml:
<?xml version="1.0" encoding="UTF-8" ?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
<Domain id="any">
<General>
<Interfaces>
<NetworkInterface name='lo' multicast='true' />
</Interfaces>
<DontRoute>true</DontRoute>
</General>
</Domain>
</CycloneDDS>
minimal.json5:
{
plugins: {
ros2dds: {
allow: {
publishers: [ "/chatter", ],
subscribers: [ "/chatter", ],
}
},
},
}
compose.zenoh_peer.yaml
services:
zenoh-peer-ros2dds:
image: eclipse/zenoh-bridge-ros2dds:0.10.1-rc.2
environment:
- ROS_DISTRO=iron
- CYCLONEDDS_URI=file:///config/cyclonedds.xml
- RUST_LOG=debug,zenoh::net=trace,zenoh_plugin_ros2dds=trace
network_mode: "host"
init: true
volumes:
- .:/config
command:
- -m peer
- -l tcp/0.0.0.0:7447
- -c /config/minimal.json5
On server: start talker
docker run -it --rm --network=host \
-e "RMW_IMPLEMENTATION=rmw_cyclonedds_cpp" \
-e "CYCLONEDDS_URI=file:///config/cyclonedds.xml" \
-v .:/config simonj23/dots_core:iron \
bash -c 'source /opt/ros/iron/setup.bash && ros2 run demo_nodes_cpp talker'
start listener
docker run -it --rm --network=host \
-e "RMW_IMPLEMENTATION=rmw_cyclonedds_cpp" \
-e "CYCLONEDDS_URI=file:///config/cyclonedds.xml" \
-v .:/config simonj23/dots_core:iron \
bash -c 'source /opt/ros/iron/setup.bash && ros2 run demo_nodes_cpp listener'
start zenoh
docker compose -f compose.zenoh_peer.yaml up
On robot1 start zenoh
docker compose -f compose.zenoh_peer.yaml up
On robot2 start zenoh
docker compose -f compose.zenoh_peer.yaml up
On robot1 start then stop listener:
docker run -it --rm --network=host -e "RMW_IMPLEMENTATION=rmw_cyclonedds_cpp" -e "CYCLONEDDS_URI=file:///config/cyclonedds.xml" -v .:/config simonj23/dots_core:iron bash -c 'source /opt/ros/iron/setup.bash && ros2 run demo_nodes_cpp listener'
[INFO] [1709553347.923254612] [listener]: I heard: [Hello World: 236]
[INFO] [1709553348.928124335] [listener]: I heard: [Hello World: 237]
[INFO] [1709553349.925678933] [listener]: I heard: [Hello World: 238]
^C[INFO] [1709553350.409051386] [rclcpp]: signal_handler(signum=2)
On robot2 start then stop listener:
docker run -it --rm --network=host -e "RMW_IMPLEMENTATION=rmw_cyclonedds_cpp" -e "CYCLONEDDS_URI=file:///config/cyclonedds.xml" -v .:/config simonj23/dots_core:iron bash -c 'source /opt/ros/iron/setup.bash && ros2 run demo_nodes_cpp listener'
[INFO] [1709553358.925248190] [listener]: I heard: [Hello World: 247]
[INFO] [1709553359.927279191] [listener]: I heard: [Hello World: 248]
^C[INFO] [1709553360.630004049] [rclcpp]: signal_handler(signum=2)
On robot2 start listener:
docker run -it --rm --network=host -e "RMW_IMPLEMENTATION=rmw_cyclonedds_cpp" -e "CYCLONEDDS_URI=file:///config/cyclonedds.xml" -v .:/config simonj23/dots_core:iron bash -c 'source /opt/ros/iron/setup.bash && ros2 run demo_nodes_cpp listener'
At this point, robot2 no longer gets any data on the chatter
topic. The situation can be recovered by restarting the zenoh container on the server.
Log files attached. Server IP address is 192.168.0.70, robot1: 192.168.0.101, robot2: 192.168.0.105.
It appears from the server logfile that something may be going wrong with topic unsubscribe. When robot1 listener is stopped, 2024-03-04T11:26:40Z
, there are two messages of UndeclareSubscriber
, but when robot2 listener is stopped, 2024-03-04T11:26:58Z
, there is only one, and the next subscribe does not correctly succeed.
server_log.txt robot1_log.txt robot2_log.txt
System info
Server: Ubuntu 20.04 arm64 Robots: Ubuntu 22.04 arm64 zenoh-bridge-ros2dds: 0.10.1-rc.2