bug: Regarding some tests on PathSelection::RelayOnly,
This issue will be discussed in detail. https://github.com/n0-computer/iroh/discussions/3670
Objective: To enable stable communication between Node A and Node B via a relay in symmetric networks, even when hole punching fails due to various reasons.
Regarding latency issues when using a relay to connect two nodes:
I am currently developing a product that uses iRoH to provide network connectivity for games.
Test environment: Windows 11, tool: official CMD built-in PING
All latency figures I mention below are RTT (Round-Trip Time).
There are two questions below; I'm unsure if they are bugs or if my testing method is incorrect.
Question 1:
Node A: My physical machine. Latency with the relay: 70 milliseconds.
Node B: Virtual machine (VM). Latency with the relay: 70 milliseconds. Latency with Node A: 1 millisecond. They are on the same local area network.
My current test environment contains two nodes. Endpoint Builder Section:
The relay URL and pkarr URL are services I configured myself; I am not using the official iRoH Relay and DNS pkarr. I am confident they are working correctly.
let endpoint = Endpoint::empty_builder(RelayMode::Custom(relay_manager.get_relay_map()))
.path_selection(PathSelection::RelayOnly)
.secret_key(secret_key)
.discovery(PkarrPublisher::builder(pkarr_manager.pkarr_url.clone()))
.discovery(PkarrResolver::builder( pkarr_manager.pkarr_url.clone()))
.discovery(DnsDiscovery::builder(config.iroh.dns_origin.clone()))
.bind()
.await
.context(EndpointBuildFailedSnafu)?;`
Removed .path_selection(PathSelection::RelayOnly). After a successful hole punch, the latency is around 3-8 milliseconds. The additional 5ms latency is likely due to the inherent latency of iRoH + tun itself, which is understandable.
The entire data flow is Ping -> Node A tun -> Node A iroh network -> Node B iroh network -> Node B tun -> Pong
However, when simulating a scenario where two nodes or one node in a symmetric network cannot successfully establish a connection and must communicate via a relay (path_selection(PathSelection::RelayOnly)), the latency jumps to 270-400ms.
The test path above is consistently iroh + tun.
I know that a 70ms latency for the relay server is already relatively high, and we can upgrade to a lower latency of 30-40ms in the final production environment.
The problem is that I don't understand how this 400ms latency is generated. Is this a configuration issue?
My understanding: Since the RTT is 70ms One-Way Delay, it should be 35ms.
NodeA -> Relay (Ping latency 35ms)
Relay -> NodeB (Ping latency 35ms)
NodeB -> Relay (Pong latency 35ms)
Relay -> NodeA (Pong latency 35ms)
35ms * 4 ≈ 140ms
This latency is too high for gaming. I don't know if this is normal latency for iRoH. Should I try modifying the relay server configuration to solve this problem, or is there a problem with my settings? If this is my problem, how should I troubleshoot it? I've been searching for a long time without any results.
Here's an example:
Without using iroh, NODEA PING node B
From 10.0.0.10: Bytes = 32 Time = 1ms TTL = 128
From 10.0.0.10: Bytes = 32 Time = 1ms TTL = 128
From 10.0.0.10: Bytes = 32 Time = 1ms TTL = 128
With iroh and after successful hole punching, IROH+TUN, NODEA PING node B
From 10.0.0.10: Bytes = 32 Time = 5ms TTL = 128
From 10.0.0.10: Bytes = 32 Time = 3ms TTL = 128
From 10.0.0.10: Bytes = 32 Time = 5ms TTL=128
From 10.0.0.10: Bytes=32 Time=4ms TTL=128
From 10.0.0.10: Bytes=32 Time=4ms TTL=128
From 10.0.0.10: Bytes=32 Time=5ms TTL=128
Using iroh IROH+TUN + PathSelection::RelayOnly Node A PING Node B
From 10.0.0.10: Bytes=32 Time=260ms TTL=128
From 10.0.0.10: Bytes=32 Time=337ms TTL=128
From 10.0.0.10: Bytes=32 Time=341ms TTL=128
From 10.0.0.10: Bytes=32 Time = 250 milliseconds TTL = 128
From 10.0.0.10: Bytes = 32 Time = 425 milliseconds TTL = 128
Question 2:
I won't repeat the details from Question A.
When I don't enable RelayOnly, and two nodes successfully connect via a relay, calling endpoint.latency(remote_endpoint_id) displays the latency correctly, almost identical to my external iroh + tun latency.
However, when I enable RelayOnly, even if the connection is successful, the latency returns empty. I don't understand this. Shouldn't the relay be transparent to the nodes? Why can't the latency be printed when two nodes connect via a relay?