rust-libp2p icon indicating copy to clipboard operation
rust-libp2p copied to clipboard

fix(quic): improve listener socket selection when dialing

Open mxinden opened this issue 2 years ago • 8 comments

Problem

Say the local node listens on /ip4/0.0.0.0/udp/0/quic-v1. One would expect a consecutive dial to /ip4/127.0.0.1/udp/42/quic-v1 to use one of the listen addresses as the source address of the dial.

That is not the case today. Instead libp2p-quic will create a new quinn endpoint and thus a new UDP socket for the outgoing dial.

Why?

In TCP we track the specific listen addresses, even if the user calls listen_on with an unspecified address (e.g. 0.0.0.0). If the user calls listen_on with a specific IP address i.e. not a wildcard like 0.0.0.0 but e.g. 127.0.0.1 we track (here register) the specific address right away.

https://github.com/libp2p/rust-libp2p/blob/f10f1a274abe6e5193de653aa4dd951ae56c775e/transports/tcp/src/lib.rs#L394

In case where the user calls listen_on with an unspecified address (e.g. 0.0.0.0) we depend on if-watch, more specifically poll_if_watch to provide us with the concrete addresses:

https://github.com/libp2p/rust-libp2p/blob/f10f1a274abe6e5193de653aa4dd951ae56c775e/transports/tcp/src/lib.rs#L666-L682

Either way, we end up tracking specific listen addresses with port-reuse enabled and can thus choose the right listener in local_dial_addr:

https://github.com/libp2p/rust-libp2p/blob/f10f1a274abe6e5193de653aa4dd951ae56c775e/transports/tcp/src/lib.rs#L122-L151

In libp2p-quic we only track the initial address that the user provided on the listen_on call.

https://github.com/libp2p/rust-libp2p/blob/f10f1a274abe6e5193de653aa4dd951ae56c775e/transports/quic/src/endpoint.rs#L170-L178

This might be a wildcard address. Later on, when dialing a 127.0.0.1 address, the check listen_addr.ip().is_loopback() == socket_addr.ip().is_loopback() discards the listener with the unspecified listen address, thus a new dial-only socket is used, even though one of the existing listener sockets is available.

This is e.g. problematic when trying to dial a node on 127.0.0.1 with the expectation that the dial uses one of the listen addresses as a source address.

How does go-libp2p solve this:

go-libp2p asks the OS for the correct interface given a destination address. It uses Google's routing package. See https://github.com/libp2p/go-libp2p/blob/260b9695cafdd8e35ec65b30ef153f0c15549c72/p2p/transport/quicreuse/reuse.go#L209 for concrete usage and implementation.

How to move forward?

I am not sure simply removing the listen_addr.ip().is_loopback() == socket_addr.ip().is_loopback() check is the way to go. This check is still helpful. E.g. say we have two listeners, one on localhost, one not on localhost. Ideally we would use the former when dialing.

We could mirror what libp2p-tcp does. That is, track the specified listen addresses reported by if-watch. Later, when dialing, choose the listener with the right address.

Ideally I would like to have the operating system choose. It has the most information about all available routes to each interface. In other words ideally we would do what go-libp2p does.

For now, I suggest we go with the libp2p-tcp approach. A second iteration would mirror what go-libp2p does.

Originally posted by @mxinden in https://github.com/libp2p/rust-libp2p/pull/3454#discussion_r1276338263

Thank you @kpp for surfacing this bug.

Thank you @marten-seemann for the quick help, pointing me to how go-libp2p solves this issue.

mxinden avatar Jul 27 '23 15:07 mxinden

I would like to work on this but I have a question. If using the same approach as in libp2p-tcp, should we also have a configuration related to port_reuse: bool (false by default) as in libp2p::tcp::Config, or make it true by default so reuse an existing listener?

arsenron avatar Aug 07 '23 14:08 arsenron

Great to hear that you are interested.

https://github.com/libp2p/rust-libp2p/issues/4226

port_reuse in libp2p-quic will be introduced with https://github.com/libp2p/rust-libp2p/issues/4226. Let's not do it as part of this issue. In other words, let's always use a listener if available for now.

Let me know if you have any other questions @arsenron.

mxinden avatar Aug 07 '23 15:08 mxinden

@mxinden Could you please help me with a question?

Could we make it as simple as adding a separate field to a Listener something like is_loopback bool. So when we add a new listener we check either ip is loopback and set it to true, or later if IfWatcher returns us a new loopback address, we also set it to true. Then in case a remote address is also loopback, we just iterate listeners to find a listener which supports loopback, if any. Or am I missing something?

arsenron avatar Aug 08 '23 10:08 arsenron

Thank you @arsenron for the fix.

I will keep this issue open given that there is still the more sophisticated solution that go-libp2p follows.

How does go-libp2p solve this:

go-libp2p asks the OS for the correct interface given a destination address. It uses Google's routing package. See https://github.com/libp2p/go-libp2p/blob/260b9695cafdd8e35ec65b30ef153f0c15549c72/p2p/transport/quicreuse/reuse.go#L209 for concrete usage and implementation.

@arsenron let me know in case you want to contribute more to the project. The above more sophisticated solution might be an option. Or any of the other help-wanted issues.

https://github.com/libp2p/rust-libp2p/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22

mxinden avatar Aug 11 '23 12:08 mxinden

hey guys, would you mind if I try to take a look at this one now? 'Cause PortUse was added after this was made.

momoshell avatar Nov 03 '25 22:11 momoshell

Hi @momoshell are you still interested in this? If so feel free to take it

jxs avatar Nov 23 '25 10:11 jxs

@jxs yes, and thanks, will give it my best spin.

I now realise that this issue is also affected by the port-reuse policies. It might be a good opportunity to bring some attention to this one PR:6187.

I believe it would be beneficial to address both of these in parallel.

momoshell avatar Nov 24 '25 15:11 momoshell

@jxs #6224 is up, but we have those failing checks now! :(

I would also like to mention that I completely agree with @mxinden observation that the reading of OS routing tables is the proper way to go. Sadly, today net-route crate only supports Linux. We could introduce the next phase as a separate PR, where we optimize for Linux machines, and we fallback to the PortReuse registry, until net-route adds support for all major OSs.

If you guys agree, I would gladly implement that optimization

momoshell avatar Nov 30 '25 16:11 momoshell