DDS-Router
DDS-Router copied to clipboard
DDS Router won't communicate between networks
I have a test setup of four containers; two are running the image router-base
derived from the dockerfile
FROM ddsrouter
RUN apt update && apt install -y tcpdump
and the other two are derived from node-base,
FROM ros:rolling
RUN apt update && apt install -y tcpdump ros-rolling-demo-nodes-cpp
I then orchestrate them using the following Docker compose file:
version: "3.9"
networks:
sideA:
ipam:
driver: default
config:
- subnet: "172.238.1.0/24"
sideB:
ipam:
driver: default
config:
- subnet: "172.238.2.0/24"
services:
node:
image: node-base
build:
context: .
dockerfile: node-base.dockerfile
stdin_open: true
tty: true
profiles: ["run"]
node-dev:
extends: node
volumes:
# Mount the source code
- ./dumps:/dumps
command:
- /bin/bash
- -c
- |
tcpdump -w /dumps/failure/client1.pcap &
source /ros_entrypoint.sh && ros2 run demo_nodes_cpp listener
environment:
- ROS_DISCOVERY_SERVER=internal-router:11811
profiles: ["good", "bad"]
networks:
sideA:
ipv4_address: 172.238.1.3
internal-router:
image: ddsrouter-base
build:
context: .
dockerfile: router-base.dockerfile
command:
- /bin/bash
- -c
- |
tcpdump -w /dumps/failure/router1.pcap &
source ./install/setup.bash
ddsrouter --config-path /config/config.yaml -d
volumes:
- ./router/:/config
- ./dumps:/dumps
ports:
- 11188:11188/udp
profiles: ["good", "bad"]
networks:
sideA:
ipv4_address: 172.238.1.2
node-dev2:
extends: node
volumes:
- ./dumps:/dumps
command:
- /bin/bash
- -c
- |
tcpdump -w /dumps/failure/client2.pcap &
sleep 5
source /ros_entrypoint.sh && ros2 run demo_nodes_cpp talker
environment:
- ROS_DISCOVERY_SERVER=router2:11811
profiles: ["good"]
networks:
sideA:
ipv4_address: 172.238.1.13
router2:
image: ddsrouter-base
command:
- /bin/bash
- -c
- |
tcpdump -w /dumps/failure/router2.pcap &
source ./install/setup.bash
ddsrouter --config-path /config/config2.yaml -d
volumes:
- ./router/:/config
- ./dumps:/dumps
ports:
- 30002:30002/tcp
- 11166:11166/tcp
profiles: ["good"]
networks:
sideA:
ipv4_address: 172.238.1.12
node-dev2-bad:
extends: node
volumes:
- ./dumps:/dumps
command:
- /bin/bash
- -c
- |
tcpdump -w /dumps/failure/client2.pcap &
sleep 5
source /ros_entrypoint.sh && ros2 run demo_nodes_cpp talker
environment:
- ROS_DISCOVERY_SERVER=router2:11811
profiles: ["bad"]
networks:
sideB:
ipv4_address: 172.238.2.3
router2-bad:
image: ddsrouter-base
command:
- /bin/bash
- -c
- |
tcpdump -w /dumps/failure/router2.pcap &
source ./install/setup.bash
ddsrouter --config-path /config/config2.yaml -d
volumes:
- ./router/:/config
- ./dumps:/dumps
ports:
- 30002:30002/tcp
- 11166:11166/tcp
profiles: ["bad"]
networks:
sideB:
ipv4_address: 172.238.2.2
aliases:
- router2
using configs
config.yaml:
version: v4.0
specs:
discovery-trigger: any
participants:
- name: LocalDiscoveryServer
kind: local-discovery-server
discovery-server-guid:
ros-discovery-server: true
id: 0
listening-addresses:
- ip: 0.0.0.0
port: 11811
transport: udp
- name: LocalWAN
kind: wan
connection-addresses:
- domain: host.docker.internal # Public IP of sever
port: 11166 # server port
transport: tcp # Transport protocol - tcp so that we don't need a back IP addy
- name: EchoParticipant # 6
kind: echo # 7
discovery: true # 8
data: true # 9
verbose: true # 10
and config2.yaml
version: v4.0
specs:
discovery-trigger: any
participants:
- name: LocalDiscoveryServer2
kind: local-discovery-server
discovery-server-guid:
ros-discovery-server: true
id: 0
listening-addresses:
- ip: 0.0.0.0
port: 11811
transport: udp
- name: LocalWAN2
kind: wan
listening-addresses:
- domain: 0.0.0.0 # Public IP of sever
port: 11166 # server port
transport: tcp # Transport protocol - tcp so that we don't need a back IP addy
- name: EchoParticipant # 6
kind: echo # 7
discovery: true # 8
data: true # 9
verbose: true # 10
If I bring the ensemble up with docker compose --profile good up
, everything works:
ros-flyer-node-dev2-1 | [INFO] [1710402041.253293400] [talker]: Publishing: 'Hello World: 6'
ros-flyer-router2-1 | In Endpoint: 01.0f.e0.c2.39.00.f3.57.00.00.00.00|0.0.3.3 from Participant: LocalDiscoveryServer2 in topic: rt/rosout payload received: Payload{00 01 00 00 f9 a9 f2 65 58 f3 18 0f 14 00 00 00 07 00 00 00 74 61 6c 6b 65 72 00 00 1d 00 00 00 50 75 62 6c 69 73 68 69 6e 67 3a 20 27 48 65 6c 6c 6f 20 57 6f 72 6c 64 3a 20 36 27 00 00 00 00 18 00 00 00 2e 2f 73 72 63 2f 74 6f 70 69 63 73 2f 74 61 6c 6b 65 72 2e 63 70 70 00 0b 00 00 00 6f 70 65 72 61 74 6f 72 28 29 00 00 2f 00 00 00} with specific qos: SpecificEndpointQoS{Partitions{};OwnershipStrength{0}}.
ros-flyer-router2-1 | In Endpoint: 01.0f.e0.c2.39.00.f3.57.00.00.00.00|0.0.14.3 from Participant: LocalDiscoveryServer2 in topic: rt/chatter payload received: Payload{00
01 00 00 0f 00 00 00 48 65 6c 6c 6f 20 57 6f 72 6c 64 3a 20 36 00 00} with specific qos: SpecificEndpointQoS{Partitions{};OwnershipStrength{0}}.
ros-flyer-internal-router-1 | In Endpoint: 01.0f.45.64.01.00.9f.e7.00.00.00.00|0.0.23.3 from Participant: LocalWAN in topic: rt/chatter payload received: Payload{00 01 00 00 0f 00 00 00 48 65 6c 6c 6f 20 57 6f 72 6c 64 3a 20 36 00 00} with specific qos: SpecificEndpointQoS{Partitions{};OwnershipStrength{0}}.
ros-flyer-node-dev-1 | [INFO] [1710402041.255290700] [listener]: I heard: [Hello World: 6]
but if I bring it up with the other router and client on a different virtual network netB
using docker compose --profile bad up
, then it doesn't work:
ros-flyer-node-dev2-bad-1 | [INFO] [1710402141.088181800] [talker]: Publishing: 'Hello World: 2'
ros-flyer-router2-bad-1 | In Endpoint: In Endpoint: 01.0f.35.db.39.00.9f.a6.00.00.00.00|01.0f.35.db.39.00.9f.a6.00.00.00.00|0.0.3.3 from Participant: LocalDiscoveryServer2 in topic: rt/rosout0.0.14.3 payload received: Payload{00 01 from Participant: 00 00 5d LocalDiscoveryServer2aa f2 65 28 in topic: rt/chatter payload received: 8cPayload{ 41 05 1400 01 00 00 0f 00 0000 0000 4800 07 00 00 6500 6c74 6c 6f 20 57 6f61 6c 6b72 65 72 00 00 1d 00 000 6c 64 3a 20 0032 5000 75 62 6c 69 73000 }68 69 6e 67 with specific qos: SpecificEndpointQoS{Partitions{}3a ;20 27 48 65 6c 6cOwnershipStrength{ 6f 20 57 6f0 }}.
ros-flyer-router2-bad-1 | 72 6c 64 3a 20 32 27 00 00 00 00 18 00 00 00 2e 2f 73 72 63 2f 74 6f 70 69 63 73 2f 74 61 6c 6b 65 72 2e 63 70 70 00 0b 00 00 00 6f 70 65 72 61 74 6f 72 28 29 00 00 2f 00 00 00} with specific qos: SpecificEndpointQoS{Partitions{};OwnershipStrength{0}}.
The tcpdump
data that's generated shows that in both cases the routers are regularly communicating via TCP in patterns that are very similar. However, they don't appear to be cross-publishing the talker
messages and thus when on different networks the clients aren't able to communicate.
Okay, I figured out a slice of the problem. The issue is that the initial peers discovery method is first used to handshake (successfully) between the two DDS Router instances at which point the server's connection-addresses locator is used for communication instead of the initial peers domain. I'd really prefer to have the initial peers value be used as the locator after discovery rather than the discovered locator since the server may be reachable through a variety of interfaces (for example, it may be available on different IPs inside of the subnet as well as on an externally-facing IP visible to the wider internet).
Hi @BenChung ,
I am not exactly sure what your use case is, and so why are you using this configuration setup. However, I'm gonna point to a few things I find odd and hopefully that might shed some light on the matter.
- Be careful with
discovery-trigger: any
option, this might result in endpoints not properly matching due to QoS incompatibilities. I suggest to use the default value (discovery-trigger: reader
). - I suggest getting rid of local discovery server participants and use simple ones instead (if multicast is available in your setting), just to simplify the scenario.
- When using
domain
tag a DNS domain is expected, not an IP. I don't know if this might be generating issues (it could actually be treated as an IP due to implementation details, I'd need to verify). - We never use
0.0.0.0
IPs in our configurations. It might actually work, but as I said it's not tested from our side. I suggest to benefit from Docker compose DNS service and set domains to be service names.
Regards
Hi, and thank you for the help! I was trying the discovery-trigger: any
option as a "sticks against the wall" debugging approach.
The issue that was proximally keeping this from working was the 0.0.0.0
IPs. I'd really like one side of this (call it the "server side") to use 0.0.0.0
or similar IP so that it doesn't have to be aware of the ingress approach. It's available under several different ports, IPs, and domain names in the ultimate configuration, and it would be nice if we didn't have to nail that down to a finite list.
As far as I can tell, what happens right now is that the WAN participant instances with one set to 0.0.0.0
will start communicating under initial peers.... but once discovery(?) information has been exchanged the other side will use the domain or IP provided in the locator provided by the other side. In the case of a 0.0.0.0 IP, this defaults to being the system's interface addresses, which really doesn't work in my setup. What I'd like to do is have the WAN participants continue communicating over the connection (IP/domain and port) as originally specified in the connect-or's configuration. This then allows me to set up the "overall" server to be ignorant of how it's connected to (k8s ingress, direct pod to pod addressing, a proxy, etc).
I can make a more specific bug report or feature request along these lines, but I suspect that what I describe is sufficiently alien to the locator model that it's hard to realize.