misc/multistream-select: Treat protocol as `String` and not `[u8]`
Description
Today we treat protocols negotiated via multistream-select as a [u8]. Instead we should treat it as a String.
Motivation
- I am not aware of a project using non-string protocol identifiers.
- The specification treats them as strings.
libp2p-identifytreats protocols as strings. See spec.- The mismatch is unnecessary.
- See this confusion in the wild https://github.com/libp2p/rust-libp2p/pull/2798#issuecomment-1209026177
- The Go implementation treats protocols as strings.
Current Implementation
https://github.com/libp2p/rust-libp2p/blob/3d3666e1a69c82afa9c966d668c32aae36c87599/misc/multistream-select/src/protocol.rs#L69-L77
Are you planning to do it yourself in a pull request?
No
Should we remove the ProtocolName abstraction as part of this or just change it to &str for now?
For the sake of type safety, what do you think of a new type wrapping a String or &str?
I'd like to work on this.
impl AsRef<str> for Protocol {
fn as_ref(&self) -> &str {
std::str::from_utf8(&self.0.as_ref()).unwrap()
}
}
Are we thinking something along these line? I am using &str here, because, String requires new memory. With AsRef<str> in place, we can replace AsRef<[u8]> with AsRef<str> in dialer_select.rs and listener_select.rs, right?
But doing so, rippled its effect into places where dialers and listeners are used, as in core/src/upgrade where NameWrap, UpgradeInfo, ProtocolName want AsRef
It is a fairly invasive change yes. Happy to mentor you along if you want to pick it up!
See some of the communication in https://github.com/libp2p/rust-libp2p/pull/2798.
I think we should store the protocol as String or &str everywhere to begin with and not convert it. Some places also use Cow so the Protocol newtype may need Cow inside!
Many abstractions that require AsRef<[u8]> currently may need to be changed to just use the new newtype.
@thomaseizinger yes, I would like to work on this.
I think we should store the protocol as String or &str everywhere to begin with and not convert it. Some places also use Cow so the Protocol newtype may need Cow inside!
So currently it is struct Protocol(Bytes) , we want to change it to something like struct Protocol(String) or struct Protocol(&str). How to decide between the two? If we want Cow, is &str better?
Some places also use Cow may need Cow inside!
I will try to look for places where Cow is used. But off the top of your head, lmk if you know places where this is the case.
Just out of curiosity: was there any particular reason for choosing Bytes over String in the first place?
In order to not have to decide between the two, we can have Cow<'static, String> :)
We will either need to use Protocol throughout the entirety of rust-libp2p or make a separate newtype that lives in libp2p-core.
I think using Protocol everywhere is worth exploring. If you don't mind, you can open a possibly not-compiling draft PR where you make comments on specific lines that are causing trouble and we continue discussing there!
multistream-select predates my involvement in the project so I don't know why bytes were chosen originally.
Cow<'static, String> wow! nice.
@thomaseizinger yeah, not-compiling draft PRs exploring the usage of Protocol everywhere sounds good. I'll begin.
Just out of curiosity: was there any particular reason for choosing Bytes over String in the first place?
[thomaseizinger]: multistream-select predates my involvement in the project so I don't know why bytes were chosen originally.
@mxinden do you know?
In order to not have to decide between the two, we can have
Cow<'static, String>:)
:+1: another option which I have seen used more would be Cow<'static, str>. Not sure what is better for us or whether there is even a difference.
Just out of curiosity: was there any particular reason for choosing Bytes over String in the first place?
[thomaseizinger]: multistream-select predates my involvement in the project so I don't know why bytes were chosen originally.
@mxinden do you know?
I don't unfortunately.
One day I would like to see rust-libp2p support both multistream-select and Protocol Select. With that in mind, I would suggest libp2p-core to expose its own Protocol type that would convert into multistream-select's Protocol type. Thus e.g. libp2p-kad would not depend on multistream-select but only on libp2p-core.
Thanks @efarg for the work and thanks @thomaseizinger for the guidance!
Cow<'static, str> is the correct version! Using String is pointless because it is already an owned type, I've mixed it up 😅
@efarg Are you still working on this? I'd otherwise pick it up soon :)
Yes, I am very much working on it. But I am a bit lost.
So here is the diff of changes made for this issue: https://paste.debian.net/hidden/524606d1/
Upon compilation, E0759 will be emitted. And I don't know how to solve it. I am thinking, the way we use Protocol struct we can't use 'static lifetime in Cow. Is that it?
Some queries:
(1) Why do we even need String? Are we ever modifying the data in Protocol struct? Can't we just use '&str' instead of Cow?
Thus e.g. libp2p-kad would not depend on multistream-select but only on libp2p-core.
(2) This implies that right now, libp2p-kad depends on multistream-select. But there is no mention of multistream-select in libp2p-kad's toml. @mxinden
We will either need to use Protocol throughout the entirety of rust-libp2p or make a separate newtype that lives in libp2p-core. I think using Protocol everywhere is worth exploring.
(3) With @mxinden 's last comment, should the idea of using Protocol throughout rust-libp2p be dropped? Will there be a newtype in libp2p-core? Right now, I think, some sort of magic abstraction happens via trait ProtocolName and struct NameWrap. @thomaseizinger
Thanks for sharing!
I'd tackle this from a different angle. Try and remove the ProtocolName trait first:
https://github.com/libp2p/rust-libp2p/blob/2025de3ef0898ffa99545bd50aef868516c7f226/core/src/upgrade.rs#L123-L129
And replace ProtocolName here with a newtype:
https://github.com/libp2p/rust-libp2p/blob/2025de3ef0898ffa99545bd50aef868516c7f226/core/src/upgrade.rs#L137-L147
The reason we need a newtype is because some protocols construct their identifiers dynamically and thus need to allocate whereas others provide just a &'static str. Using Cow allows us to handle both of these cases to their optimum. To pull all of that completely through, you will need to convert the string in multistream-select to bytes but that is easy as an interim step.
Once that phase is done, you can adopt multistream-select to accept a string instead of bytes as the final action.
Hope this helps! :)
@thomaseizinger what is meant by replacing ProtocolName with newtype? newtype is a tuple struct, right? How will a struct replace ProtocolName, which is a trait?
I have been reading about newtype from ch. 19 in The Book, and can't find anything which would allow such replacement.
You are right, it can't be replaced 1 to 1!
Essentially, removing ProtocolName from UpgradeInfo means that you can remove the Info associated type and just express InfoIter's bound as: IntoIterator<Item = ProtocolName> (where ProtocolName is now a struct).
Basically what we are doing is removing one layer of "genericness". Instead of allowing many different types that all have to implement ProtocolName, we are making one type that is used everywhere.
Does that make sense? :)
@thomaseizinger sharing some progress. This patch gives 29 errors. I am thinking about how to solve them.
https://paste.debian.net/hidden/3c3659e9/
Major problem is with either.rs. Also, fn. signature changes like this one worry me:
- fn upgrade_inbound(self, socket: C, info: Self::Info) -> Self::Future;
+ fn upgrade_inbound(self, socket: C, info: ProtocolName) -> Self::Future;
I am thinking instead of replacing Self::Info with ProtocolName, it should be replaced by something that would give us the type of value stored in the iterator InfoIter. But I don't know how to do that. I am thinking something like: Self::InfoIter::Item.
@thomaseizinger sharing some progress. This patch gives 29 errors. I am thinking about how to solve them.
https://paste.debian.net/hidden/3c3659e9/
Can you open it as a pull-request please? I'd be able to make inline comments then :)
Major problem is with
either.rs. Also, fn. signature changes like this one worry me:- fn upgrade_inbound(self, socket: C, info: Self::Info) -> Self::Future; + fn upgrade_inbound(self, socket: C, info: ProtocolName) -> Self::Future;
Yes, that is exactly the signature change we want to see!
I am thinking instead of replacing
Self::InfowithProtocolName, it should be replaced by something that would give us the type of value stored in the iteratorInfoIter. But I don't know how to do that. I am thinking something like:Self::InfoIter::Item.
I am not sure this makes sense? The point of this change is to remove a layer of abstraction. By virtue, this means we will get more concrete in some places, e.g. the type in the iterator is always fixed!
I am not sure this makes sense? The point of this change is to remove a layer of abstraction. By virtue, this means we will get more concrete in some places, e.g. the type in the iterator is always fixed!
Great! I will open a pull request.
I am still in favor of this change. @efarg let me know in case you are still interested in contributing this change. (Sorry in case I am missing a related discussion.)
I am still in favor of this change. @efarg let me know in case you are still interested in contributing this change. (Sorry in case I am missing a related discussion.)
They are working on it! We do have a conversation going on Element about this :)