Dynamic QoS Matching for Router Endpoints Based on Subscriber Requirements
Problem Description:
Currently, the Fast DDS Router, when forwarding topics between different ROS_DOMAIN_IDs (or more generally, DDS domains), creates its internal DDS DataReaders and DataWriters with a default QoS profile that often uses BEST_EFFORT reliability. This occurs regardless of the QoS settings of the original publisher or the ultimate subscriber in the ROS 2 ecosystem.
This default behavior leads to issues when a ROS 2 publisher is publishing with RELIABLE reliability and a ROS 2 subscriber on the other side of the router is also expecting RELIABLE reliability. The router's internal BEST_EFFORT endpoints break the chain of reliability, preventing the ROS 2 subscriber from connecting to the router's BEST_EFFORT publisher.
Example Scenario:
ROS_DOMAIN_ID=1: A ROS 2 node publishes topic/my_topicwithRELIABLEreliability.- Fast DDS Router: Configured to bridge
ROS_DOMAIN_ID=1andROS_DOMAIN_ID=2.- The router creates an internal DDS
DataReaderon domain1for/my_topic.ros2 topic info -vreveals this reader usesBEST_EFFORT. - The router creates an internal DDS
DataWriteron domain2for/my_topic.ros2 topic info -vreveals this writer usesBEST_EFFORT.
- The router creates an internal DDS
ROS_DOMAIN_ID=2: A ROS 2 node subscribes to/my_topicwithRELIABLEreliability.
Outcome: The subscriber on Domain 2 cannot connect to the router's DataWriter on domain 2 because of the QoS mismatch (RELIABLE subscriber vs. BEST_EFFORT publisher).
Current Workaround:
The current workaround is to manually specify the QoS settings (e.g., RELIABLE reliability) for each affected topic in the Fast DDS Router's YAML configuration file.
While this works, it has limitations:
- It requires manual configuration for every topic that needs a specific QoS other than the router's default.
- It's not dynamic. If subscribers with different QoS profiles for the same topic exist, the static configuration might not satisfy all of them optimally (though a
RELIABLErouter publisher can serve bothRELIABLEandBEST_EFFORTsubscribers).
Proposed Feature:
A new mechanism or an enhancement to the existing behavior where the Fast DDS Router can dynamically adjust the QoS of its internal DataReader and DataWriter endpoints to better match the requirements of the actual discovered subscribers.
Desired Behavior:
- When the router discovers a publisher for a topic it intends to forward (e.g.,
/my_topicfrom domain1, published withRELIABLEreliability). - When the router discovers a subscriber for the same topic on a different domain (e.g.,
/my_topicon domain2, subscribing withRELIABLEreliability). - The router should:
- Create its internal
DataReader(on domain1) with a QoS profile that is compatible with both the original publisher's QoS and the requirements of the downstream subscriber(s). Ideally, if the subscriber needsRELIABLE, the router's reader should also attempt to beRELIABLE(assuming the original publisher supports it). - Create its internal
DataWriter(on domain2) with a QoS profile that matches or is compatible with the discovered subscriber's QoS. If the subscriber requestsRELIABLE, the router's writer for that path should also beRELIABLE.
- Create its internal
Is this something that is planned?
Hi @Tuntenfisch, I couldn't reproduce the behavior you are observing. Could you please specify your setup? ROS distro version, DDS-Router version, concrete YAML router configuration, etc. The expected behavior (if "discovery-trigger" is not configured) is to use both for writers and readers (in all participants) the QoS of the first external reader discovered.
Hello,
thanks for your reply. I'm using ROS2 Humble with DDS Router version v3.2.0. I just did a small experiment again to see how the router behaves and if it actually forwards topics with the correct QoS (which is the reader's QoS in my case because I use the router config below).
TL;DR: It does forward topics with the QoS of the first external reader discovered, so seems like I was either talking out of my ass or drunk when I created the issue originally. But the proposed feature would still bring the DDS Router behavior more in line with standard ROS single-domain behavior if implemented.
Experiment Setup:
config.yaml
version: v5.0
participants:
- name: a
kind: simple
domain: 0
- name: b
kind: simple
domain: 5
specs:
discovery-trigger: reader
remove-unused-entities: true
script.py
import sys
import rclpy
from rclpy.node import Node
from rclpy.qos import QoSProfile, ReliabilityPolicy
from std_msgs.msg import String
class PublisherNode(Node):
def __init__(self, qos_profile: QoSProfile) -> None:
super().__init__("publisher_node")
self.publisher = self.create_publisher(String, "/global/topic", qos_profile)
self.timer = self.create_timer(1.0, self.publish)
def publish(self) -> None:
msg = String()
msg.data = "Hello, ROS 2!"
self.publisher.publish(msg)
self.get_logger().info(f"Published: {msg.data}")
class SubscriberNode(Node):
def __init__(self, qos_profile: QoSProfile) -> None:
super().__init__("subscriber_node")
self.subscriber = self.create_subscription(
String, "/global/topic", self.receive, qos_profile
)
def receive(self, msg: String) -> None:
self.get_logger().info(f"Received: {msg.data}")
if __name__ == "__main__":
if (
len(sys.argv) != 3
or sys.argv[1] not in ["publisher", "subscriber"]
or sys.argv[2] not in ["reliable", "best_effort"]
):
print("Usage: python temp.py [publisher|subscriber] [reliable|best_effort]")
sys.exit(1)
rclpy.init(args=sys.argv)
if sys.argv[2] == "reliable":
qos_profile = QoSProfile(reliability=ReliabilityPolicy.RELIABLE, depth=10)
elif sys.argv[2] == "best_effort":
qos_profile = QoSProfile(reliability=ReliabilityPolicy.BEST_EFFORT, depth=10)
if sys.argv[1] == "publisher":
node = PublisherNode(qos_profile)
elif sys.argv[1] == "subscriber":
node = SubscriberNode(qos_profile)
try:
rclpy.spin(node)
except KeyboardInterrupt:
pass
rclpy.shutdown()
Scenarios Tested
First Scenario
- Start router with
ddsrouter -c config.yaml. - Start publisher with
ROS_DOMAIN_ID=0 python script.py publisher reliable. - Start subscriber with
ROS_DOMAIN_ID=5 python script.py subscriber reliable. - Start subscriber with
ROS_DOMAIN_ID=5 python script.py subscriber best_effort.
The result is that the DDS Router creates the internal reader/writer with the reliable QoS because that was the first reader, i.e. subscriber, created and both subscribers can be served by the forwarded topic. Just like you said.
Second Scenario
- Start router with
ddsrouter -c config.yaml. - Start publisher with
ROS_DOMAIN_ID=0 python script.py publisher reliable. - Start subscriber with
ROS_DOMAIN_ID=5 python script.py subscriber best_effort. - Start subscriber with
ROS_DOMAIN_ID=5 python script.py subscriber reliable.
The result is that the DDS Router creates the internal reader/writer with the best effort QoS because that was the first reader, i.e. subscriber, created. But obviously now the reliable subscriber cannot actually subscribe to the forwarded topic because of the QoS mismatch. But according to your description, this is expected behavior.
Conclusion
So, like already mentioned, I must have had a brain-fart writing my original issue because when going through it now, I cannot reproduce what I claimed to be the "current" behavior. My apologies!
The experiment, just described, clearly shows the router is working as intended. But the feature I proposed in my original issue could still proof useful for resolving the QoS mismatch in the second scenario I went through above:
-
If the DDS Router was able to dynamically recreate a reader/writer with a QoS that is compatible with all downstream subscribers (given that the original publisher's QoS is compatible) when new subscribers emerge, and not solely rely on the first discovered reader's QoS for its internal reader/writer, the second scenario would work.
This would then also mirror the functionality of how ROS works when not using separate domains at all, something I would think is desirable.
Is there any update on this?
@Tuntenfisch Not at the moment, although it is possible to manually set the QoS for a specific topic and this way solve a handful of these corner cases if ever arise in your scenario. Nevertheless I agree it would be interesting to add a way to allow creating the endpoints with the most compatible QoS , but we do not plan to add this configuration option in the near future.