zenoh-plugin-ros2dds icon indicating copy to clipboard operation
zenoh-plugin-ros2dds copied to clipboard

[Bug] zenoh_bridge_ros2dds matching listener created only sometimes

Open felix-kocht opened this issue 11 months ago • 5 comments

Describe the bug

When sending ros2 messages over a topic from a client to a server via zenoh, they sometimes do not arrive. To establish a connection over the zenoh bridge, a ros2 subscriber is to be created automatically on the client that reads in the messages from the desired topic. This does not happen in some cases, resulting in no messages transferred to the server.

With our workaround (removing some code) it works for us, but removing some code is probably not ideal: https://github.com/pixel-robotics/zenoh-plugin-ros2dds/tree/always_create_subscriber

To reproduce

For a ros2 topic we have a

  • Subscriber on server
  • Publisher on robot

We use zenoh-dds-bridge as router on server, and as client to the router on our robot. then restart the zenoh bridge on robot -> new zenoh subscriber on robot (matching to zenoh server publisher) is sometimes created and sometimes not. Thus, the messages published on the robot only sometimes arrive on the server, sometimes not.

System info

  • Client: Jetson Xavier (Ubuntu 22.04)
  • Server: Jetson Xavier (Ubuntu 22.04)
  • connection: Wifi
  • Ros2 version: iron
  • zenoh_bridge_ros2dds version: 1.1.0

felix-kocht avatar Dec 20 '24 15:12 felix-kocht

to reproduce

on server: launch bridge in default mode (with router) zenoh-bridge-ros2dds -l tcp/0.0.0.0:7447 then ros2 topic echo /testtopic

on client: launch bridge in client mode: zenoh-bridge-ros2dds client -e tcp/192.168.1.157:7447 then ros2 topic pub /testtopic std_msgs/Int32 "data: 42"

then, restart the bridge on client until at some point, the subscriber on the server stops echoing the messages. This is unexpected, as e.g. after a robot restart the server should still receive messages without re-subscribing. Restarting the subscriber or the client bridge resolves the issue. Reproduces using the default config

Subscribing to the messages using the python API on the server has the same issue, they also stop arriving.

jplapp avatar Jan 20 '25 22:01 jplapp

@JEnoch as this seems to affect quite a few people, could you by chance have a look into it? Or, is there a way we can support? We were able to make a workaround but unfortunately I don't think it can be merged as it is

jplapp avatar Sep 05 '25 22:09 jplapp

Hi @jplapp , I tried to replicate the issue you describe with version 1.5.1, but it didn’t appear. I tested the following commands on 1 host (don't have 2 currently), with distinct ROS_DOMAIN_ID for isolation:

  1. ROS_DOMAIN_ID=1 zenoh-bridge-ros2dds -l tcp/0.0.0.0:7447
  2. ROS_DOMAIN_ID=2 zenoh-bridge-ros2dds client -e tcp/localhost:7447 -l tcp/0.0.0.0:7448
  3. ROS_DOMAIN_ID=2 ros2 topic pub /testtopic std_msgs/Int32 "data: 42"
  4. ROS_DOMAIN_ID=1 ros2 topic echo /testtopic

Then I stopped and restarted the "client" bridge on domain 2. The subscriber well resumes echoing the messages, even after 15 minutes.

Do you still experience the issue with 1.5.1 ? Are your 2 hosts well configured with ROS_LOCALHOST_ONLY=1 (or ROS_AUTOMATIC_DISCOVERY_RANGE=LOCALHOST since Jazzy) ?

JEnoch avatar Sep 09 '25 08:09 JEnoch

Hi @JEnoch , We met same problem with zenoh version 1.5.0. ROS_LOCALHOST_ONLY=1 is set in our system. Besides restart the peer or client, establishing another zenoh session by subscribing another routed ros topic will make the previously unmattched zenoh session match each other. After reviewing the release notes for 1.5.1, I don't think the issue was solved.

Micbetter avatar Sep 25 '25 02:09 Micbetter

Hi @JEnoch , sorry for the late reply here. We have tested it using the 1.6.2 release and for us the issue does not reproduce anymore.

jplapp avatar Nov 14 '25 14:11 jplapp