[Bug] zenoh_bridge_ros2dds matching listener created only sometimes
Describe the bug
When sending ros2 messages over a topic from a client to a server via zenoh, they sometimes do not arrive. To establish a connection over the zenoh bridge, a ros2 subscriber is to be created automatically on the client that reads in the messages from the desired topic. This does not happen in some cases, resulting in no messages transferred to the server.
With our workaround (removing some code) it works for us, but removing some code is probably not ideal: https://github.com/pixel-robotics/zenoh-plugin-ros2dds/tree/always_create_subscriber
To reproduce
For a ros2 topic we have a
- Subscriber on server
- Publisher on robot
We use zenoh-dds-bridge as router on server, and as client to the router on our robot. then restart the zenoh bridge on robot -> new zenoh subscriber on robot (matching to zenoh server publisher) is sometimes created and sometimes not. Thus, the messages published on the robot only sometimes arrive on the server, sometimes not.
System info
- Client: Jetson Xavier (Ubuntu 22.04)
- Server: Jetson Xavier (Ubuntu 22.04)
- connection: Wifi
- Ros2 version: iron
- zenoh_bridge_ros2dds version: 1.1.0
to reproduce
on server: launch bridge in default mode (with router) zenoh-bridge-ros2dds -l tcp/0.0.0.0:7447
then ros2 topic echo /testtopic
on client: launch bridge in client mode: zenoh-bridge-ros2dds client -e tcp/192.168.1.157:7447
then ros2 topic pub /testtopic std_msgs/Int32 "data: 42"
then, restart the bridge on client until at some point, the subscriber on the server stops echoing the messages. This is unexpected, as e.g. after a robot restart the server should still receive messages without re-subscribing. Restarting the subscriber or the client bridge resolves the issue. Reproduces using the default config
Subscribing to the messages using the python API on the server has the same issue, they also stop arriving.
@JEnoch as this seems to affect quite a few people, could you by chance have a look into it? Or, is there a way we can support? We were able to make a workaround but unfortunately I don't think it can be merged as it is
Hi @jplapp ,
I tried to replicate the issue you describe with version 1.5.1, but it didn’t appear.
I tested the following commands on 1 host (don't have 2 currently), with distinct ROS_DOMAIN_ID for isolation:
ROS_DOMAIN_ID=1 zenoh-bridge-ros2dds -l tcp/0.0.0.0:7447ROS_DOMAIN_ID=2 zenoh-bridge-ros2dds client -e tcp/localhost:7447 -l tcp/0.0.0.0:7448ROS_DOMAIN_ID=2 ros2 topic pub /testtopic std_msgs/Int32 "data: 42"ROS_DOMAIN_ID=1 ros2 topic echo /testtopic
Then I stopped and restarted the "client" bridge on domain 2. The subscriber well resumes echoing the messages, even after 15 minutes.
Do you still experience the issue with 1.5.1 ?
Are your 2 hosts well configured with ROS_LOCALHOST_ONLY=1 (or ROS_AUTOMATIC_DISCOVERY_RANGE=LOCALHOST since Jazzy) ?
Hi @JEnoch , We met same problem with zenoh version 1.5.0. ROS_LOCALHOST_ONLY=1 is set in our system. Besides restart the peer or client, establishing another zenoh session by subscribing another routed ros topic will make the previously unmattched zenoh session match each other. After reviewing the release notes for 1.5.1, I don't think the issue was solved.
Hi @JEnoch , sorry for the late reply here. We have tested it using the 1.6.2 release and for us the issue does not reproduce anymore.