iroh icon indicating copy to clipboard operation
iroh copied to clipboard

iroh-net: When guessing at a direct address we do not always send to the same one

Open flub opened this issue 1 year ago • 3 comments

When we have a relay connection and some direct addresses, but have not yet figured out any information about those direct addresses - like whether they work- we try to pick a random direct address from the list and send to relay + this guess at the same time.

The idea is that maybe we picked a usable address and it just works and we figure this out and start using this path for real.

However we do not stably select the direct address path, so each time we send a packet a new direct address path is being selected. That's rather useless.

This is the bug: https://github.com/n0-computer/iroh/blob/4640f4d0049e582cc89d81e976b59961043efb3f/iroh-net/src/magicsock/node_map/node_state.rs#L295

flub avatar Jul 08 '24 18:07 flub

This is on the hot-path of .poll_send(), an RNG is too much overhead for this.

flub avatar Jul 08 '24 18:07 flub

This has led to another issue: when my server has multiple IP addresses, only one of them can successfully punch through (in my tests, it's the IPV6 address), with most being LAN addresses. Due to the random selection of direct connection addresses currently, the probability of the correct IP address for successful punching being chosen is very low, relying entirely on luck. Sometimes, it takes a long time before my IPV6 is used for testing (I even began to wonder if the protocol dislikes IPV6, haha). I have attached a complete log. In my opinion, for early punching, LAN IPs might need a higher priority, but since LAN success rates are already high, if a LAN connection is not successful within a certain time (as soon as possible), more opportunities for punching through with public IPs should be given. This is also related to the https://github.com/n0-computer/iroh/issues/2317

logs_1720497577.log

zh522130 avatar Jul 09 '24 04:07 zh522130

I'm not sure if this is related to the issue, but my thought is that the address selection strategy might need to be optimized.

zh522130 avatar Jul 09 '24 04:07 zh522130

This has led to another issue: when my server has multiple IP addresses, only one of them can successfully punch through (in my tests, it's the IPV6 address), with most being LAN addresses. Due to the random selection of direct connection addresses currently, the probability of the correct IP address for successful punching being chosen is very low, relying entirely on luck. Sometimes, it takes a long time before my IPV6 is used for testing (I even began to wonder if the protocol dislikes IPV6, haha). I have attached a complete log. In my opinion, for early punching, LAN IPs might need a higher priority, but since LAN success rates are already high, if a LAN connection is not successful within a certain time (as soon as possible), more opportunities for punching through with public IPs should be given. This is also related to the #2317

logs_1720497577.log

I think this is a misunderstanding, for holepunching it does not guess a random address. It uses all the addresses for holepunching. It does take a long time to holepunch in this log and I need need check it out more to fully understand. But let's move this part back to #2317.

flub avatar Jul 10 '24 12:07 flub