rust-libp2p icon indicating copy to clipboard operation
rust-libp2p copied to clipboard

misc/multistream-select: Treat protocol as `String` and not `[u8]`

Open mxinden opened this issue 3 years ago • 16 comments

Description

Today we treat protocols negotiated via multistream-select as a [u8]. Instead we should treat it as a String.

Motivation

  • I am not aware of a project using non-string protocol identifiers.
  • The specification treats them as strings.
  • libp2p-identify treats protocols as strings. See spec.
  • The mismatch is unnecessary.
  • See this confusion in the wild https://github.com/libp2p/rust-libp2p/pull/2798#issuecomment-1209026177
  • The Go implementation treats protocols as strings.

Current Implementation

https://github.com/libp2p/rust-libp2p/blob/3d3666e1a69c82afa9c966d668c32aae36c87599/misc/multistream-select/src/protocol.rs#L69-L77

Are you planning to do it yourself in a pull request?

No

mxinden avatar Aug 20 '22 04:08 mxinden

Should we remove the ProtocolName abstraction as part of this or just change it to &str for now?

thomaseizinger avatar Aug 20 '22 07:08 thomaseizinger

For the sake of type safety, what do you think of a new type wrapping a String or &str?

mxinden avatar Aug 20 '22 09:08 mxinden

I'd like to work on this.

impl AsRef<str> for Protocol {
    fn as_ref(&self) -> &str {
	std::str::from_utf8(&self.0.as_ref()).unwrap()
    }
}

Are we thinking something along these line? I am using &str here, because, String requires new memory. With AsRef<str> in place, we can replace AsRef<[u8]> with AsRef<str> in dialer_select.rs and listener_select.rs, right?

But doing so, rippled its effect into places where dialers and listeners are used, as in core/src/upgrade where NameWrap, UpgradeInfo, ProtocolName want AsRef. When I made changes to accommodate this, more places depending on the mentioned three started emitting errors. This keeps going on...

efarg avatar Aug 28 '22 11:08 efarg

It is a fairly invasive change yes. Happy to mentor you along if you want to pick it up!

See some of the communication in https://github.com/libp2p/rust-libp2p/pull/2798.

I think we should store the protocol as String or &str everywhere to begin with and not convert it. Some places also use Cow so the Protocol newtype may need Cow inside!

thomaseizinger avatar Aug 28 '22 12:08 thomaseizinger

Many abstractions that require AsRef<[u8]> currently may need to be changed to just use the new newtype.

thomaseizinger avatar Aug 28 '22 12:08 thomaseizinger

@thomaseizinger yes, I would like to work on this.

I think we should store the protocol as String or &str everywhere to begin with and not convert it. Some places also use Cow so the Protocol newtype may need Cow inside!

So currently it is struct Protocol(Bytes) , we want to change it to something like struct Protocol(String) or struct Protocol(&str). How to decide between the two? If we want Cow, is &str better?

Some places also use Cow may need Cow inside!

I will try to look for places where Cow is used. But off the top of your head, lmk if you know places where this is the case.

Just out of curiosity: was there any particular reason for choosing Bytes over String in the first place?

efarg avatar Aug 28 '22 16:08 efarg

In order to not have to decide between the two, we can have Cow<'static, String> :)

We will either need to use Protocol throughout the entirety of rust-libp2p or make a separate newtype that lives in libp2p-core.

I think using Protocol everywhere is worth exploring. If you don't mind, you can open a possibly not-compiling draft PR where you make comments on specific lines that are causing trouble and we continue discussing there!

multistream-select predates my involvement in the project so I don't know why bytes were chosen originally.

thomaseizinger avatar Aug 28 '22 16:08 thomaseizinger

Cow<'static, String> wow! nice.

@thomaseizinger yeah, not-compiling draft PRs exploring the usage of Protocol everywhere sounds good. I'll begin.

efarg avatar Aug 28 '22 17:08 efarg

Just out of curiosity: was there any particular reason for choosing Bytes over String in the first place?

[thomaseizinger]: multistream-select predates my involvement in the project so I don't know why bytes were chosen originally.

@mxinden do you know?

efarg avatar Aug 28 '22 17:08 efarg

In order to not have to decide between the two, we can have Cow<'static, String> :)

:+1: another option which I have seen used more would be Cow<'static, str>. Not sure what is better for us or whether there is even a difference.

Just out of curiosity: was there any particular reason for choosing Bytes over String in the first place?

[thomaseizinger]: multistream-select predates my involvement in the project so I don't know why bytes were chosen originally.

@mxinden do you know?

I don't unfortunately.

One day I would like to see rust-libp2p support both multistream-select and Protocol Select. With that in mind, I would suggest libp2p-core to expose its own Protocol type that would convert into multistream-select's Protocol type. Thus e.g. libp2p-kad would not depend on multistream-select but only on libp2p-core.

Thanks @efarg for the work and thanks @thomaseizinger for the guidance!

mxinden avatar Aug 30 '22 06:08 mxinden

Cow<'static, str> is the correct version! Using String is pointless because it is already an owned type, I've mixed it up 😅

thomaseizinger avatar Aug 30 '22 07:08 thomaseizinger

@efarg Are you still working on this? I'd otherwise pick it up soon :)

thomaseizinger avatar Sep 16 '22 07:09 thomaseizinger

Yes, I am very much working on it. But I am a bit lost.

So here is the diff of changes made for this issue: https://paste.debian.net/hidden/524606d1/

Upon compilation, E0759 will be emitted. And I don't know how to solve it. I am thinking, the way we use Protocol struct we can't use 'static lifetime in Cow. Is that it?

Some queries:

(1) Why do we even need String? Are we ever modifying the data in Protocol struct? Can't we just use '&str' instead of Cow?

Thus e.g. libp2p-kad would not depend on multistream-select but only on libp2p-core.

(2) This implies that right now, libp2p-kad depends on multistream-select. But there is no mention of multistream-select in libp2p-kad's toml. @mxinden

We will either need to use Protocol throughout the entirety of rust-libp2p or make a separate newtype that lives in libp2p-core. I think using Protocol everywhere is worth exploring.

(3) With @mxinden 's last comment, should the idea of using Protocol throughout rust-libp2p be dropped? Will there be a newtype in libp2p-core? Right now, I think, some sort of magic abstraction happens via trait ProtocolName and struct NameWrap. @thomaseizinger

efarg avatar Sep 16 '22 17:09 efarg

Thanks for sharing!

I'd tackle this from a different angle. Try and remove the ProtocolName trait first:

https://github.com/libp2p/rust-libp2p/blob/2025de3ef0898ffa99545bd50aef868516c7f226/core/src/upgrade.rs#L123-L129

And replace ProtocolName here with a newtype:

https://github.com/libp2p/rust-libp2p/blob/2025de3ef0898ffa99545bd50aef868516c7f226/core/src/upgrade.rs#L137-L147

The reason we need a newtype is because some protocols construct their identifiers dynamically and thus need to allocate whereas others provide just a &'static str. Using Cow allows us to handle both of these cases to their optimum. To pull all of that completely through, you will need to convert the string in multistream-select to bytes but that is easy as an interim step.

Once that phase is done, you can adopt multistream-select to accept a string instead of bytes as the final action.

Hope this helps! :)

thomaseizinger avatar Sep 19 '22 05:09 thomaseizinger

@thomaseizinger what is meant by replacing ProtocolName with newtype? newtype is a tuple struct, right? How will a struct replace ProtocolName, which is a trait?

I have been reading about newtype from ch. 19 in The Book, and can't find anything which would allow such replacement.

efarg avatar Sep 22 '22 12:09 efarg

You are right, it can't be replaced 1 to 1!

Essentially, removing ProtocolName from UpgradeInfo means that you can remove the Info associated type and just express InfoIter's bound as: IntoIterator<Item = ProtocolName> (where ProtocolName is now a struct).

Basically what we are doing is removing one layer of "genericness". Instead of allowing many different types that all have to implement ProtocolName, we are making one type that is used everywhere.

Does that make sense? :)

thomaseizinger avatar Sep 22 '22 14:09 thomaseizinger

@thomaseizinger sharing some progress. This patch gives 29 errors. I am thinking about how to solve them.

https://paste.debian.net/hidden/3c3659e9/

Major problem is with either.rs. Also, fn. signature changes like this one worry me:

-    fn upgrade_inbound(self, socket: C, info: Self::Info) -> Self::Future;
+    fn upgrade_inbound(self, socket: C, info: ProtocolName) -> Self::Future;

I am thinking instead of replacing Self::Info with ProtocolName, it should be replaced by something that would give us the type of value stored in the iterator InfoIter. But I don't know how to do that. I am thinking something like: Self::InfoIter::Item.

efarg avatar Oct 02 '22 17:10 efarg

@thomaseizinger sharing some progress. This patch gives 29 errors. I am thinking about how to solve them.

https://paste.debian.net/hidden/3c3659e9/

Can you open it as a pull-request please? I'd be able to make inline comments then :)

Major problem is with either.rs. Also, fn. signature changes like this one worry me:

-    fn upgrade_inbound(self, socket: C, info: Self::Info) -> Self::Future;
+    fn upgrade_inbound(self, socket: C, info: ProtocolName) -> Self::Future;

Yes, that is exactly the signature change we want to see!

I am thinking instead of replacing Self::Info with ProtocolName, it should be replaced by something that would give us the type of value stored in the iterator InfoIter. But I don't know how to do that. I am thinking something like: Self::InfoIter::Item.

I am not sure this makes sense? The point of this change is to remove a layer of abstraction. By virtue, this means we will get more concrete in some places, e.g. the type in the iterator is always fixed!

thomaseizinger avatar Oct 02 '22 23:10 thomaseizinger

I am not sure this makes sense? The point of this change is to remove a layer of abstraction. By virtue, this means we will get more concrete in some places, e.g. the type in the iterator is always fixed!

Great! I will open a pull request.

efarg avatar Oct 03 '22 05:10 efarg

I am still in favor of this change. @efarg let me know in case you are still interested in contributing this change. (Sorry in case I am missing a related discussion.)

mxinden avatar Mar 01 '23 16:03 mxinden

I am still in favor of this change. @efarg let me know in case you are still interested in contributing this change. (Sorry in case I am missing a related discussion.)

They are working on it! We do have a conversation going on Element about this :)

thomaseizinger avatar Mar 01 '23 21:03 thomaseizinger