rust-libp2p icon indicating copy to clipboard operation
rust-libp2p copied to clipboard

Libp2p quic third attempt

Open kpp opened this issue 4 years ago • 36 comments

A continuation of https://github.com/libp2p/rust-libp2p/pull/2159.

Adds support for tls, removes the UnboundedSenders from the endpoint. The endpoint still has UnboundedReceivers, but that's not different than having a VecDeque. The transport dial api for example doesn't allow applying backpressure.

Closes #1334

Todo: transfer unresolved comments from the previous PR

kpp avatar Oct 14 '21 12:10 kpp

Thanks @kpp for driving this forward. I will take a look.

mxinden avatar Oct 14 '21 12:10 mxinden

So what's missing here? Is this one ready to merge?

dvc94ch avatar Dec 29 '21 18:12 dvc94ch

See https://github.com/paritytech/polkadot-sdk/issues/536

kpp avatar Dec 29 '21 20:12 kpp

barebones-x509 author here. What was the reason for switching to an alternate library? I wrote barebones-x509 specifically for use in libp2p, and I would like to know what made x509-parser a better choice.

DemiMarie avatar Jan 24 '22 00:01 DemiMarie

Sure. barebones-x509 was not tested against go implementation (and required several fixes as you remember). There could be other errors so I wrote my own and tested it.

kpp avatar Jan 26 '22 12:01 kpp

Sure. barebones-x509 was not tested against go implementation (and required several fixes as you remember). There could be other errors so I wrote my own and tested it.

If I recall correctly the flaws were in the code that called barebones-x509, not barebones-x509 itself. If you are aware of a flaw in barebones-x509, please report it as a bug. barebones-x509 uses the same ASN.1 parser (ring) as WebPKI does, and does zero memory allocations internally (except in ring’s RSA code), so I expect it to be quite a bit more robust than der-parser or x509-parser.

DemiMarie avatar Jan 26 '22 15:01 DemiMarie

while I didn't have a preference earlier, I've recently had to parse/generate some pkcs7 and dive deep into asn1/der/ber/pem formats, rasn seems to be quite neat. yasna and other crates have very incomplete support afaik.

dvc94ch avatar Jan 26 '22 23:01 dvc94ch

you should be able to parse the certificate with https://github.com/XAMPPRocky/rasn/tree/main/standards/pkix

dvc94ch avatar Jan 26 '22 23:01 dvc94ch

@kpp I think one of the remaining tasks would be to get rid of the UnboundedReceivers, for denial-of-service prevention.

DemiMarie avatar Jan 26 '22 23:01 DemiMarie

@kpp I prepared a merge of latest master and wrote a test for concurrent connections and streams. Do you want to merge it into this branch? https://github.com/kpp/rust-libp2p/pull/13

mxinden avatar Feb 13 '22 11:02 mxinden

Yeah, the more tests we get, the better it is. I am working on on another implementation which has a highly better performance. I will push it in several days.

kpp avatar Feb 15 '22 11:02 kpp

So there is another implementation: https://github.com/kpp/rust-libp2p/tree/libp2p-quic-reimplementation and https://github.com/kpp/libp2p-perf/tree/quic-reimpl. The code is a bit dirty, but the swarm can connect from rust <-> go and the perf is much better:

$ ./run.sh 
# Start Rust and Golang servers.
Local peer id: PeerId("Qmcqq9TFaYbb94uwdER1BXyGfCFY4Bb1gKozxNyVvLvTSw")
about to listen on "/ip4/127.0.0.1/udp/9992/quic"
Listening on "/ip4/127.0.0.1/udp/9992/quic".
Listening on "/ip4/192.168.1.173/udp/9992/quic".

# Rust -> Rust

## Transport security noise
Local peer id: PeerId("12D3KooWA8UTRtRQqNi8tMyyyXmMuXxe8umQXTs4LzsuqoiXsshR")
IncomingConnection { local_addr: "/ip4/0.0.0.0/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/40876/quic" }
ConnectionEstablished { peer_id: PeerId("12D3KooWA8UTRtRQqNi8tMyyyXmMuXxe8umQXTs4LzsuqoiXsshR"), endpoint: Listener { local_addr: "/ip4/0.0.0.0/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/40876/quic" }, num_established: 1, concurrent_dial_errors: None }
Interval	Transfer	Bandwidth
0 s - 10.00 s	2698 MBytes	2157.97 MBit/s

# Rust -> Golang

Local peer id: PeerId("12D3KooWDxhshLBPTJcECvJafrYuYt76fcbmD3m9xdBBV1SR7HEU")
  ERROR libp2p_quic::connection: stream stopped server bidirectional stream 0
    at /home/kpp/.cargo/git/checkouts/rust-libp2p-2465c479375e5ed6/f7e52d0/transports/quic/src/connection.rs:392

Interval	Transfer	Bandwidth
0 s - 10.00 s	1932 MBytes	1545.14 MBit/s


# Golang -> Rust

## Transport security noise
2022/02/21 15:44:09 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
IncomingConnection { local_addr: "/ip4/0.0.0.0/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/51067/quic" }
ConnectionEstablished { peer_id: PeerId("12D3KooWJ5kqPcjKKT9iQcQB4KVFg5xDZRM2BgSyheBqwNQRZWBq"), endpoint: Listener { local_addr: "/ip4/0.0.0.0/udp/9992/quic", send_back_addr: "/ip4/127.0.0.1/udp/51067/quic" }, num_established: 1, concurrent_dial_errors: None }
Interval 	Transfer	Bandwidth
0s - 10.00 s 	1670 MBytes	1335.96 MBit/s

# Golang -> Golang

2022/02/21 15:44:19 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). See https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-Buffer-Size for details.
Interval 	Transfer	Bandwidth
0s - 10.00 s 	1592 MBytes	1273.55 MBit/s

I am still playing with code, so please don't judge.

Unfortunately I am still unable to make test in https://github.com/kpp/rust-libp2p/pull/13 succeed either. I suspect there is an issue with the list of wakers in Muxer per every stream.

kpp avatar Feb 21 '22 13:02 kpp

can you elaborate on what the bottle neck was and what changes you made? 1545.14 MBit/s doesn't seem like it's much of an improvement over previous code. I believe that's roughly the perf I got too.

dvc94ch avatar Feb 21 '22 13:02 dvc94ch

You are looking into rust -> golang which didn't work at all. The previous result for rust -> rust was 288.72 MBit/s.

kpp avatar Feb 21 '22 13:02 kpp

So what was the problem exactly? I'm pretty sure I'd have noticed if the code was that slow, although haven't kept up to date with all the changes that were made.

dvc94ch avatar Feb 21 '22 14:02 dvc94ch

There seems to be something fishy with these numbers anyway ## Transport security noise is the base line? It doesn't seem to have anything to do with quic

dvc94ch avatar Feb 21 '22 14:02 dvc94ch

Don't pay attention to it: https://github.com/kpp/libp2p-perf/blob/6d6ceaead4edac8ababd3cfa5c97b96c2adf0c29/run.sh#L26.

kpp avatar Feb 21 '22 14:02 kpp

Well, 2.2GBit/s seems pretty good and should saturate most internet connections. Still curious as to how you achieved the x7.5 speed up.

dvc94ch avatar Feb 21 '22 14:02 dvc94ch

Played around with this pull request once more over the weekend.

Outcome is basically mxinden@1230178 which removes the unbounded channels.

Is it possible to avoid circumventing backpressure?

DemiMarie avatar Feb 21 '22 16:02 DemiMarie

Played around with this pull request once more over the weekend. Outcome is basically mxinden@1230178 which removes the unbounded channels.

Is it possible to avoid circumventing backpressure?

I am not sure I follow @DemiMarie. Are you asking whether it is possible for libp2p-quic to enforce backpressure? If I am not mistaken, given that we are using the quinn-proto state machine directly, yes, it seems to be possible.

mxinden avatar Feb 23 '22 14:02 mxinden

Played around with this pull request once more over the weekend. Outcome is basically mxinden@1230178 which removes the unbounded channels.

Is it possible to avoid circumventing backpressure?

I am not sure I follow @DemiMarie. Are you asking whether it is possible for libp2p-quic to enforce backpressure? If I am not mistaken, given that we are using the quinn-proto state machine directly, yes, it seems to be possible.

I am referring to the comment in the source code about cloning a Sender.

Also, have you considered switching back to https://github.com/barebones-x509/barebones-x509 for certificate parsing? That will be much lighter than x509-parser, and it uses the same ASN.1 parser that webpki does. The reason that it has not seen much development is simply that it is already sufficient for libp2p-quic and I have not seen much in the way of feature requests from third parties.

DemiMarie avatar Feb 23 '22 16:02 DemiMarie

what about https://docs.rs/x509 ? it should be even more lightweight, avoiding the webpki details (haven’t checked if it supports everything required though)

dignifiedquire avatar Feb 23 '22 18:02 dignifiedquire

what about https://docs.rs/x509 ? it should be even more lightweight, avoiding the webpki details (haven’t checked if it supports everything required though)

That only supports serialization, and the webpki stuff is just for error codes.

DemiMarie avatar Feb 23 '22 19:02 DemiMarie

Ah sorry, it seems it is not published yet, https://github.com/RustCrypto/formats/tree/master/x509 is what I was referring to, which does encoding as well, as far as I understand the tests.

dignifiedquire avatar Feb 23 '22 20:02 dignifiedquire

@dignifiedquire that makes sense.

DemiMarie avatar Feb 23 '22 21:02 DemiMarie

If I understand it correctly, the reason for using the quinn-proto state-machine instead of quinn was mostly because quinn only supports tokio as runtime [1]. There has recently been some work towards adding support to async-std for quinn [2]. Have you considered switching to use quinn directly? I think this could save us a lot of complexity.

Also related to this: @mxinden raised the idea in the past to remove the concept of listeners and instead implement Stream directly on the transport [3]. Doing this would in our case allow us to implement Transport directly on (a wrapper of) quinn::Endpoint with something like this for example:

struct Endpoint {
  inner: quinn::Endpoint
}

impl Transport for Endpoint {
   type Dial = quinn::Connecting;
   type Output = quinn::NewConnection;
   type ListenerUpgrade = quinn::Connecting;
   ...
}

impl Stream for Endpoint {
   type Item = ListenerEvent<_>
   ...
}

(Did not fully think this trough yet so don't focus on the specific types; it's more about the general idea). I have been experimenting a little bit with this to see if our TCP transport implementation could be refactored according to the proposed transport trait changes. From a first draft it seems to be a doable ([4]) by pushing a lot of the listeners logic out of swarm::connection::ListenersStream and into GenTcpConfig, though I did not test the integration in the swarm yet.

What do you think about this idea in general, do you think it is worth exploring? Happy to help and work on a PoC of how quinn::Endpoint could be integrated with this.

  • [1] https://github.com/quinn-rs/quinn/issues/502
  • [2] https://github.com/quinn-rs/quinn/pull/1346
  • [3] https://github.com/libp2p/rust-libp2p/discussions/2113#discussioncomment-944923, https://github.com/libp2p/rust-libp2p/pull/2529#issuecomment-1087384988
  • [4] https://github.com/elenaf9/rust-libp2p/commit/0c3c45efcb2caae1a673635c5124fd600e866fd2

elenaf9 avatar May 06 '22 18:05 elenaf9

@elenaf9 I am working on the final polishing of the code. Everything works but I need to refactor it a little bit.

Still the work done by @yu-re-ka is worth looking at. Maybe I can throw away some code in the future.

kpp avatar May 07 '22 15:05 kpp

I am still wondering whether it would make more sense to switch to using the quinn top level crate, which would integrate well with the new StreamMuxer trait once quinn-rs/quinn#1357 is merged. Also with quinn-rs/quinn#1219 the unbound channels will be removed as well as the spawning of a task for every connection, which afaik were so far the major reasons for not using quinn (together with the tokio runtime mentioned in my above comment). Apart from these points, what is the reasoning for favoring quinn_proto over quinn?

Yes, as far as I remember, and as far as I can tell (little involvement thus far) the following were the outstanding blockers:

  • Usage of unbounded channels https://github.com/quinn-rs/quinn/issues?q=is%3Aissue+is%3Aopen+unbounded
  • Spawning of tasks https://github.com/quinn-rs/quinn/issues/914
  • Tokio only support. Though at this point, I would be open for a Tokio only libp2p-quic with the outlook of eventually supporting async-std. https://github.com/quinn-rs/quinn/issues/502

@elenaf9 would going with quinn-proto first and eventually switching to quinn be an option and not require to many wasted engineering hours?

mxinden avatar Jun 01 '22 12:06 mxinden

Yes, as far as I remember, and as far as I can tell (little involvement thus far) the following were the outstanding blockers:

The first two points would be solved with https://github.com/quinn-rs/quinn/pull/1219. But to be fair I don't know how long it will take till that lands and a new quinn version is published.

@elenaf9 would going with quinn-proto first and eventually switching to quinn be an option and not require to many wasted engineering hours?

I am not sure. We could do that, yes. But after looking at the StreamMuxer changes from paritytech/substrate#2648 I think we have to adapt the current implementation of the quic muxer anyway (see https://github.com/libp2p/rust-libp2p/pull/2289#discussion_r885718446 ). From what I can tell the quinn API would already give us the necessary types without much work needed from our side. That being said, I am not completely opposed to sticking with quinn_proto for the first version. I am just wondering if finishing the current implementation is more work than switching would be. I'll leave it up to @kpp to decide 🙂

elenaf9 avatar Jun 01 '22 13:06 elenaf9

I don't think it's possible to switch from quinn_proto to quinn untill https://github.com/quinn-rs/quinn/pull/1219. Would you please tell me more about merging strategy? Should I merge https://github.com/libp2p/rust-libp2p/pull/2648 and https://github.com/libp2p/rust-libp2p/pull/2652 and fix Transport/StreamMuxer API or is there another way of merging this into master?

kpp avatar Jun 01 '22 14:06 kpp