rust-libp2p icon indicating copy to clipboard operation
rust-libp2p copied to clipboard

Add `dns` and `relay::v2::client::transport::ClientTransport` to relay_v2 example

Open demfabris opened this issue 3 years ago • 13 comments
trafficstars

Here I'm creating a reproducible example for #2503

demfabris avatar Feb 10 '22 02:02 demfabris

Could you test with the patch below?

diff --git a/protocols/relay/examples/relay_v2.rs b/protocols/relay/examples/relay_v2.rs
index 0e3fde7c..2794405b 100644
--- a/protocols/relay/examples/relay_v2.rs
+++ b/protocols/relay/examples/relay_v2.rs
@@ -57,8 +57,8 @@ fn main() -> Result<(), Box<dyn Error>> {
     let tcp = tcp::TcpConfig::new().nodelay(true).port_reuse(true);
     let dns_tcp = block_on(dns::DnsConfig::system(tcp))?;
 
-    let transport = dns_tcp
-        .or_transport(relay_transport)
+    let transport = relay_transport
+        .or_transport(dns_tcp)

mxinden avatar Feb 10 '22 10:02 mxinden

Could you test with the patch below?

diff --git a/protocols/relay/examples/relay_v2.rs b/protocols/relay/examples/relay_v2.rs
index 0e3fde7c..2794405b 100644
--- a/protocols/relay/examples/relay_v2.rs
+++ b/protocols/relay/examples/relay_v2.rs
@@ -57,8 +57,8 @@ fn main() -> Result<(), Box<dyn Error>> {
     let tcp = tcp::TcpConfig::new().nodelay(true).port_reuse(true);
     let dns_tcp = block_on(dns::DnsConfig::system(tcp))?;
 
-    let transport = dns_tcp
-        .or_transport(relay_transport)
+    let transport = relay_transport
+        .or_transport(dns_tcp)

That works @mxinden! Is this an intended behaviour?

demfabris avatar Feb 10 '22 16:02 demfabris

Thanks for debugging this @demfabris!

Just for the sake of completeness, this is what is happening:

  • With OrTransport(relay_transport, dns_tcp_transport) (The error case)
    1. OrTransport forwards the dial to dns_transport first
    2. dns_transport resolves the DNS address to an IP address and forwards the dial to tcp_transport
    3. tcp_transport sees the p2p-circuit and errors out with MultiaddrNotSupported
  • With OrTransport(dns_tcp_transport, relay_transport) (The solution)
    1. OrTransport forwards the dial to relay_transport first
    2. relay_transport detects the Multiaddr to contain a p2p-circuit
    3. ... everything goes as expected.

Is this an intended behaviour?

Let's say I would prefer to not have this footgun. A way I see to prevent users from running into this: Take the other Transport implementation (in our case dns_transport) as an argument to Client::new_transport_and_behaviour and combine the relay_transport and the dns_transport via OrTransport in the right order within the new_transport_and_behaviour method, returning the combined `OrTransport.

What do you think @demfabris?

mxinden avatar Feb 10 '22 19:02 mxinden

Why can’t the OrTransport try with the second option after having seen the error from the first?

rkuhn avatar Feb 10 '22 21:02 rkuhn

Hmm I see @mxinden. So you're suggesting we feed an optional transport arg to Client::new_transport_and_behaviour. I'm a bit worried about this function signature becoming confusing. What you have in mind?

Also, since we are proposing a fix specifically to relay module, how confident are you about other possible problematic transport combinations?

demfabris avatar Feb 11 '22 01:02 demfabris

Why can’t the OrTransport try with the second option after having seen the error from the first?

Each Transport::dial implementation can be thought of as two phases.

  1. Synchronous validation of the provided Multiaddr. E.g. in the case of libp2p-tcp it validates the Multiaddr via multiaddr_to_socketaddr. https://github.com/libp2p/rust-libp2p/blob/861e15dabbba28ce5aff3c6332ab10ce9c4c24b8/transports/tcp/src/lib.rs#L396-L403 Once phase one is successful, each Transport::dial implementation returns a Future, e.g. in the case of TCP it returns the do_dial future: https://github.com/libp2p/rust-libp2p/blob/861e15dabbba28ce5aff3c6332ab10ce9c4c24b8/transports/tcp/src/lib.rs#L350-L369

  2. Asynchronous polling of returned Future executes the actual dialing logic, e.g. in the case of libp2p-tcp it waits for the socket to become writable: https://github.com/libp2p/rust-libp2p/blob/861e15dabbba28ce5aff3c6332ab10ce9c4c24b8/transports/tcp/src/provider/async_io.rs#L51-L52

The fail-over mechanism of the OrTransport (currently) only happens on failure of phase 1, but not on failure of phase 2.

https://github.com/libp2p/rust-libp2p/blob/861e15dabbba28ce5aff3c6332ab10ce9c4c24b8/core/src/transport/choice.rs#L66-L84

In order to fail over after a failure of the first Transport in its phase 2, the Future returned by OrTransport::dial would need to keep a copy of the second Transport in order to be able to execute a second dial once the first failed. That would require the second Transport to be Clone.

@rkuhn does the above make sense?

I am not trying to say, that this should not be done, I am only laying out the status quo.

mxinden avatar Feb 11 '22 15:02 mxinden

Also, since we are proposing a fix specifically to relay module, how confident are you about other possible problematic transport combinations?

@demfabris I am not aware of any other problematic transport combinations. Am I missing one? In case there is none, the above suggestion of extending new_transport_and_behaviour would be a valid fix and not so much a hack.

I'm a bit worried about this function signature becoming confusing. What you have in mind?

What do you think of the diff below?

diff --git a/protocols/relay/src/v2/client.rs b/protocols/relay/src/v2/client.rs
index fa4447df..cd1f3aba 100644
--- a/protocols/relay/src/v2/client.rs
+++ b/protocols/relay/src/v2/client.rs
@@ -101,17 +101,18 @@ pub struct Client {
 }
 
 impl Client {
-    pub fn new_transport_and_behaviour(
+    pub fn new_transport_and_behaviour<T>(
         local_peer_id: PeerId,
+        transport: T,
     ) -> (transport::ClientTransport, Self) {
-        let (transport, from_transport) = transport::ClientTransport::new();
+        let (relay_transport, from_transport) = transport::ClientTransport::new();
         let behaviour = Client {
             local_peer_id,
             from_transport,
             directly_connected_peers: Default::default(),
             queued_actions: Default::default(),
         };
-        (transport, behaviour)
+        (OrTransport(relay_transport, transport), behaviour)
     }
 }
 

Due to another reason, putting the relay_transport first is documented below. Though I am sure most folks miss this small comment, thus whatever we do, I think we should enforce this at the type system level.

https://github.com/libp2p/rust-libp2p/blob/96dbfcd1ade6de5b4ae623f59fb0d67b2916576a/protocols/relay/src/v2/client/transport.rs#L110-L111

mxinden avatar Feb 11 '22 15:02 mxinden

Sounds alright to me @mxinden. Let's move forward with the fix 👍🏻

demfabris avatar Feb 11 '22 22:02 demfabris

Hey @mxinden I was about to push this changes but something got me thinking: Your suggested diff would cause it to be required to have an existing transport to construct a v2::Client, even tho I don't see the use case for a single v2::Client transport to be operating alone, do you think this is something we want?

What you think about we make two construct methods, one called from_transport... and the other remain as you suggested?

demfabris avatar Feb 23 '22 13:02 demfabris

Your suggested diff would cause it to be required to have an existing transport to construct a v2::Client, even tho I don't see the use case for a single v2::Client transport to be operating alone, do you think this is something we want?

I don't think one would ever construct a v2::Client without an additional Transport. After all, the relaying has to happen over some Transport. Let me know in case I am missing something.

mxinden avatar Feb 25 '22 16:02 mxinden

I totally agree, I'm asking this because one of the tests is implemented with v2::Client alone and it got me wondering

demfabris avatar Feb 25 '22 17:02 demfabris

I totally agree, I'm asking this because one of the tests is implemented with v2::Client alone and it got me wondering

Oh, mind pointing me to it @demfabris?

mxinden avatar Feb 25 '22 18:02 mxinden

I think this section confused me earlier https://github.com/libp2p/rust-libp2p/blob/master/protocols/relay/tests/v2.rs#L320 but now I see that's not the case 😆. Sorry I must've read this on a hurry

demfabris avatar Feb 25 '22 18:02 demfabris

This pull request has merge conflicts. Could you please resolve them @demfabris? 🙏

mergify[bot] avatar Mar 22 '23 15:03 mergify[bot]

Closing as stale.

thomaseizinger avatar Sep 20 '23 02:09 thomaseizinger