fix(quic): fix address translation
Description
This fixes address translation for QUIC that was essentially non-existent before.
Notes & open questions
Test is analogous to corresponding test in TCP protocol implementation. I have added test for both async-std and tokio, even though compiling crate with just tokio feature enabled causes a lot of compilation warnings.
What I'm not sure about is whether old behavior is intentional, but to be it seemed like a bug that caused major issues.
Change checklist
- [x] I have performed a self-review of my own code
- [ ] I have made corresponding changes to the documentation
- [x] I have added tests that prove my fix is effective or that my feature works
- [x] A changelog entry has been made in the appropriate crates
As discussed in the open maintainers call today, @nazar-pc can you research whether you are behind an endpoint indepedent or dependent NAT?
I get unique random outgoing port for every localip:localport:remoteip:remoteport combination, here are the states after doing running stun client test against stun.xten.com:
LAN udp 192.168.1.2:30535 -> 216.93.246.18:3478
WAN udp a.b.c.d:54496 (192.168.1.2:30535) -> 216.93.246.18:3478
LAN udp 192.168.1.2:30536 -> 216.93.246.18:3478
WAN udp a.b.c.d:24189 (192.168.1.2:30536)
LAN udp 192.168.1.2:30535 -> 216.93.246.15:3478
WAN udp a.b.c.d:10373 (192.168.1.2:30535) -> 216.93.246.15:3478
So even if my app is making request from port 30535, the actual request will go from a random port. If I make request from that port to a different host, the port will also be different.
The only reason I'm reachable is because I have port forwarding from a.b.c.d:30535 to 192.168.1.2:30535.
I see autonat doesn't actually care about the address, it will replace all received addresses with an observed one anyway and then will try to dial it. So response in autonat doesn't actually corresponds to request directly. What I don't like about this is that autonat server is essentially doing something similar to address translation, but it should still work.
The problem is that because autonat client concatenates observed addresses with listen addresses rather than the other way around, it will in many cases confirm ephemeral external address instead of actual address with QUIC on my machine. I think eventually it will be able to find public address, but certainly not immediately.
Another concern is that all local listen addresses are exposed when probing, which is somewhat revealing when running on the host and not in an isolated container.
So even if my app is making request from port 30535, the actual request will go from a random port. If I make request from that port to a different host, the port will also be different.
That sounds like a symmetric NAT to me. What is odd is that your NAT device doesn't seem to take into account the configured port-forwarding. Whilst not necessary for things to function, it would seem logical to apply the port-forwarding in a reverse fashion for outgoing connections to allow peers to use the observed address.
In this case, address translation would be helpful because you've done a direct mapping of your port-forwarding. But, we don't know that in the application so not doing address translation is as good of a guess as doing it ...
Once we ship https://github.com/libp2p/rust-libp2p/pull/4568, we should be able to improve things here. In that PR, we know whether we reused a port or not for a new connection. This means, libp2p-identify can stop emitting candidates for ephemeral ports altogether.
As a result, we can remove the current address_translation functionality entirely and instead build out some functionality in libp2p-swarm that generates more candidates based on the current listen addresses and candidates emitted by other protocols. Perhaps this functionality should also be in libp2p-identify as it is the one that knows that the candidate was generated via the remote observing our connection.
This pull request has merge conflicts. Could you please resolve them @nazar-pc? 🙏
So looks like you're saying that address translation will be moved to identify at some point in the future? I guess that makes sense, but I don't see why this PR shouldn't be merged in the meantime.
I think it is pretty clear that there is an issue with QUIC right now and the fact that it works with Autonat v1 is a coincidence because Autonat v1 essentially also does address translation on its end, replacing IP in listen address with public IP, resulting in reachable address overall.
On the surface it also looks like with Autonat v2 QUIC will break if no other changes are done because server-side address translation will not happen anymore (if my understanding is correct).
So looks like you're saying that address translation will be moved to identify at some point in the future?
Yes that is the plan. Identify is the protocol that gathers the observed address, meaning it is its job to filter out ephemeral ports. With the changes of #4568, we know whether an outgoing connection used a new port or the existing listen port. In the case of a new one, we can just discard the observed address because we know that it is garbage and will never be a valid candidate for an external address. Thus, there is no more need to do address translation.
I think it is pretty clear that there is an issue with QUIC right now
It is not really clear to me. Are the nodes that you are observing this on listening for incoming connections or are they only making outgoing connections? If you don't have a listening socket then we will make one if a random port. Perhaps that is the issue you are observing?