DDS-Router icon indicating copy to clipboard operation
DDS-Router copied to clipboard

Dynamic QoS Matching for Router Endpoints Based on Subscriber Requirements

Open Tuntenfisch opened this issue 8 months ago • 5 comments

Problem Description:

Currently, the Fast DDS Router, when forwarding topics between different ROS_DOMAIN_IDs (or more generally, DDS domains), creates its internal DDS DataReaders and DataWriters with a default QoS profile that often uses BEST_EFFORT reliability. This occurs regardless of the QoS settings of the original publisher or the ultimate subscriber in the ROS 2 ecosystem.

This default behavior leads to issues when a ROS 2 publisher is publishing with RELIABLE reliability and a ROS 2 subscriber on the other side of the router is also expecting RELIABLE reliability. The router's internal BEST_EFFORT endpoints break the chain of reliability, preventing the ROS 2 subscriber from connecting to the router's BEST_EFFORT publisher.

Example Scenario:

  1. ROS_DOMAIN_ID=1: A ROS 2 node publishes topic /my_topic with RELIABLE reliability.
  2. Fast DDS Router: Configured to bridge ROS_DOMAIN_ID=1 and ROS_DOMAIN_ID=2.
    • The router creates an internal DDS DataReader on domain 1 for /my_topic. ros2 topic info -v reveals this reader uses BEST_EFFORT.
    • The router creates an internal DDS DataWriter on domain 2 for /my_topic. ros2 topic info -v reveals this writer uses BEST_EFFORT.
  3. ROS_DOMAIN_ID=2: A ROS 2 node subscribes to /my_topic with RELIABLE reliability.

Outcome: The subscriber on Domain 2 cannot connect to the router's DataWriter on domain 2 because of the QoS mismatch (RELIABLE subscriber vs. BEST_EFFORT publisher).

Current Workaround:

The current workaround is to manually specify the QoS settings (e.g., RELIABLE reliability) for each affected topic in the Fast DDS Router's YAML configuration file.

While this works, it has limitations:

  • It requires manual configuration for every topic that needs a specific QoS other than the router's default.
  • It's not dynamic. If subscribers with different QoS profiles for the same topic exist, the static configuration might not satisfy all of them optimally (though a RELIABLE router publisher can serve both RELIABLE and BEST_EFFORT subscribers).

Proposed Feature:

A new mechanism or an enhancement to the existing behavior where the Fast DDS Router can dynamically adjust the QoS of its internal DataReader and DataWriter endpoints to better match the requirements of the actual discovered subscribers.

Desired Behavior:

  1. When the router discovers a publisher for a topic it intends to forward (e.g., /my_topic from domain 1, published with RELIABLE reliability).
  2. When the router discovers a subscriber for the same topic on a different domain (e.g., /my_topic on domain 2, subscribing with RELIABLE reliability).
  3. The router should:
    • Create its internal DataReader (on domain 1) with a QoS profile that is compatible with both the original publisher's QoS and the requirements of the downstream subscriber(s). Ideally, if the subscriber needs RELIABLE, the router's reader should also attempt to be RELIABLE (assuming the original publisher supports it).
    • Create its internal DataWriter (on domain 2) with a QoS profile that matches or is compatible with the discovered subscriber's QoS. If the subscriber requests RELIABLE, the router's writer for that path should also be RELIABLE.

Tuntenfisch avatar May 06 '25 08:05 Tuntenfisch

Is this something that is planned?

Tuntenfisch avatar Jul 11 '25 13:07 Tuntenfisch

Hi @Tuntenfisch, I couldn't reproduce the behavior you are observing. Could you please specify your setup? ROS distro version, DDS-Router version, concrete YAML router configuration, etc. The expected behavior (if "discovery-trigger" is not configured) is to use both for writers and readers (in all participants) the QoS of the first external reader discovered.

juanlofer-eprosima avatar Jul 15 '25 12:07 juanlofer-eprosima

Hello,

thanks for your reply. I'm using ROS2 Humble with DDS Router version v3.2.0. I just did a small experiment again to see how the router behaves and if it actually forwards topics with the correct QoS (which is the reader's QoS in my case because I use the router config below).

TL;DR: It does forward topics with the QoS of the first external reader discovered, so seems like I was either talking out of my ass or drunk when I created the issue originally. But the proposed feature would still bring the DDS Router behavior more in line with standard ROS single-domain behavior if implemented.

Experiment Setup:

config.yaml
version: v5.0

participants:
  - name: a
    kind: simple
    domain: 0

  - name: b
    kind: simple
    domain: 5

specs:
  discovery-trigger: reader
  remove-unused-entities: true
script.py
import sys

import rclpy
from rclpy.node import Node
from rclpy.qos import QoSProfile, ReliabilityPolicy
from std_msgs.msg import String


class PublisherNode(Node):
    def __init__(self, qos_profile: QoSProfile) -> None:
        super().__init__("publisher_node")
        self.publisher = self.create_publisher(String, "/global/topic", qos_profile)
        self.timer = self.create_timer(1.0, self.publish)

    def publish(self) -> None:
        msg = String()
        msg.data = "Hello, ROS 2!"
        self.publisher.publish(msg)
        self.get_logger().info(f"Published: {msg.data}")


class SubscriberNode(Node):
    def __init__(self, qos_profile: QoSProfile) -> None:
        super().__init__("subscriber_node")
        self.subscriber = self.create_subscription(
            String, "/global/topic", self.receive, qos_profile
        )

    def receive(self, msg: String) -> None:
        self.get_logger().info(f"Received: {msg.data}")


if __name__ == "__main__":
    if (
        len(sys.argv) != 3
        or sys.argv[1] not in ["publisher", "subscriber"]
        or sys.argv[2] not in ["reliable", "best_effort"]
    ):
        print("Usage: python temp.py [publisher|subscriber] [reliable|best_effort]")
        sys.exit(1)

    rclpy.init(args=sys.argv)

    if sys.argv[2] == "reliable":
        qos_profile = QoSProfile(reliability=ReliabilityPolicy.RELIABLE, depth=10)
    elif sys.argv[2] == "best_effort":
        qos_profile = QoSProfile(reliability=ReliabilityPolicy.BEST_EFFORT, depth=10)

    if sys.argv[1] == "publisher":
        node = PublisherNode(qos_profile)
    elif sys.argv[1] == "subscriber":
        node = SubscriberNode(qos_profile)

    try:
        rclpy.spin(node)
    except KeyboardInterrupt:
        pass

    rclpy.shutdown()

Scenarios Tested

First Scenario

  1. Start router with ddsrouter -c config.yaml.
  2. Start publisher with ROS_DOMAIN_ID=0 python script.py publisher reliable.
  3. Start subscriber with ROS_DOMAIN_ID=5 python script.py subscriber reliable.
  4. Start subscriber with ROS_DOMAIN_ID=5 python script.py subscriber best_effort.

The result is that the DDS Router creates the internal reader/writer with the reliable QoS because that was the first reader, i.e. subscriber, created and both subscribers can be served by the forwarded topic. Just like you said.

Second Scenario

  1. Start router with ddsrouter -c config.yaml.
  2. Start publisher with ROS_DOMAIN_ID=0 python script.py publisher reliable.
  3. Start subscriber with ROS_DOMAIN_ID=5 python script.py subscriber best_effort.
  4. Start subscriber with ROS_DOMAIN_ID=5 python script.py subscriber reliable.

The result is that the DDS Router creates the internal reader/writer with the best effort QoS because that was the first reader, i.e. subscriber, created. But obviously now the reliable subscriber cannot actually subscribe to the forwarded topic because of the QoS mismatch. But according to your description, this is expected behavior.

Conclusion

So, like already mentioned, I must have had a brain-fart writing my original issue because when going through it now, I cannot reproduce what I claimed to be the "current" behavior. My apologies!

The experiment, just described, clearly shows the router is working as intended. But the feature I proposed in my original issue could still proof useful for resolving the QoS mismatch in the second scenario I went through above:

  • If the DDS Router was able to dynamically recreate a reader/writer with a QoS that is compatible with all downstream subscribers (given that the original publisher's QoS is compatible) when new subscribers emerge, and not solely rely on the first discovered reader's QoS for its internal reader/writer, the second scenario would work.

    This would then also mirror the functionality of how ROS works when not using separate domains at all, something I would think is desirable.

Tuntenfisch avatar Jul 15 '25 15:07 Tuntenfisch

Is there any update on this?

Tuntenfisch avatar Oct 21 '25 13:10 Tuntenfisch

@Tuntenfisch Not at the moment, although it is possible to manually set the QoS for a specific topic and this way solve a handful of these corner cases if ever arise in your scenario. Nevertheless I agree it would be interesting to add a way to allow creating the endpoints with the most compatible QoS , but we do not plan to add this configuration option in the near future.

juanlofer-eprosima avatar Oct 21 '25 14:10 juanlofer-eprosima