rust-libp2p icon indicating copy to clipboard operation
rust-libp2p copied to clipboard

Reducing allocations for transfer-heavy protocols by using borrowed data in protobuf files

Open thomaseizinger opened this issue 2 years ago • 8 comments

Description

This is inspired by PRs from @joshuef (#4751, #4753, #4754).

The idea is to use Bytes to avoid potentially costly clones of large chunks of memory as we are processing the messages. For gossipsub, a single message might be sent to multiple clients. For kademlia, we might serve the same record multiple times from the record store. In both cases, there shouldn't be a need for any additional allocations.

The main problem though is that the generated protobuf structs use Vec<u8> and thus force an allocation onto us every time we want to send one of these. https://github.com/tafia/quick-protobuf does have an option to generate Cow instead of Vec<u8> and thus allow encoding of borrowed data. In the past, we weren't able to use this because asynchronous-codec couldn't encode borrowed data. But it can now! See https://github.com/mxinden/asynchronous-codec/pull/9.

As a result, I think it is worth experimenting whether or not we can now use the generated protobuf structs.

To actually make use of this, we need to:

  • Re-run pb-rs with the correct options to use Cow instead of Vec:
    pb-rs **/generated/*.proto`
    
  • Work out whether or not we can adapt quick-protobuf-codec to work with the new protobuf files
  • Update the CI check to not fail on the now differing files
  • Actually make use of this in (all?) protocols to avoid allocations where possible

Motivation

Less memory allocations.

Current Implementation

We allocate each message again just to serialize it to the wire.

Are you planning to do it yourself in a pull request ?

No

thomaseizinger avatar Nov 02 '23 00:11 thomaseizinger

Re-run pb-rs with the correct options to use Cow instead of Vec:

I started with this an it yields a lot of compile errors (understandably). Those will all need to be fixed. Perhaps in a first iteration by just using Cow::Owned everywhere as that is essentially what we are currently doing by using Vec<u8>.

The lifetime would have to be 'static in many places.

To actually make use of borrowed data in gossipsub for example, we need to defer encoding to the protobuf format for as long as possible. Essentially, we need to carry some data type that uses Bytes all the way down to the encoding step. Then we can, in a single function, construct e.g. a proto::RPC with a borrow using a local lifetime to the Bytes instance and pass it to the Encoder. For gossipsub, this would be e.g. here: https://github.com/libp2p/rust-libp2p/blob/96b4c4aa74085407d7844afe274a77c05dc766de/protocols/gossipsub/src/handler.rs#L332

@joshuef Are you interested in working on this?

thomaseizinger avatar Nov 02 '23 02:11 thomaseizinger

To actually make use of borrowed data in gossipsub for example, we need to defer encoding to the protobuf format for as long as possible.

To achieve this, we'd need to do something similar as @mxinden described here. I.e. we need to extend HandlerIn to send different types into the handler. However, contrary to @mxinden's example, these cannot be instances of proto::RPC as those will have a lifetime.

thomaseizinger avatar Nov 02 '23 02:11 thomaseizinger

@joshuef Are you interested in working on this?

Let me check out the scope of this and see how deep things go. If I have time to get in amongst it in the next week I'll definitely do that, or may be able to get someone from the team to have a look if no. I'll report back here when it's a bit clearer for me. 👍

joshuef avatar Nov 02 '23 08:11 joshuef

I did start looking at this, but won't have time to dig in until next week sometime, I think.

joshuef avatar Nov 02 '23 16:11 joshuef

No worries, it is an internal change so we can ship it at any point, doesn't need to go into the current release :)

thomaseizinger avatar Nov 02 '23 20:11 thomaseizinger

So I've been digging about further on this this morning, but I am blocked by the Keypairs essentially. Right now we zeroize the inbound bytes when we form the SKs. But with the Cow<u8> setup we don't necessarily have ownership of this to perform this...

Error[E0596]: cannot borrow data in dereference of `Cow<'_, [u8]>` as mutable --> identity/src/keypair.rs:294:59 | 294 | return rsa::Keypair::try_decode_pkcs1(&mut private_key.Data).map(|sk| { | ^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Cow<'_, [u8]>

I've been trying out some dereffing, borrows, passing vecs, but it all comes down to that same issue that I don't think we can guarantee that we properly `Zeroize` the bytes therin...

I'm not really sure if there's a way around this? Suggestions very welcome

(My WIP branch: https://github.com/joshuef/rust-libp2p/tree/GossipBytesPbCow)

joshuef avatar Nov 13 '23 11:11 joshuef

If the data is borrowed, we don't need to zeroize it, I'd say? Whoever owns the data should zeroize it when it gets dropped :)

Meaning, we can just match on the Cow and do nothing if it is Cow::Borrowed?

thomaseizinger avatar Nov 14 '23 02:11 thomaseizinger

Okay, I'll have a look at implementing in that fashion then, thanks @thomaseizinger ! 🙇

joshuef avatar Nov 14 '23 09:11 joshuef