core/src/peer_id: Support and output peer IDs as CIDs (RFC 0001)
rust-libp2p should implement RFC 0001.
Tracking issue: https://github.com/libp2p/specs/issues/216
Also tackle:
https://github.com/libp2p/rust-libp2p/blob/9f331e517f524dd112bf9e12b26d9dfae9162022/protocols/kad/src/protocol.rs#L102-L104
@mxinden, I have spent a few hours reading over the several documents related to this.
Would it be okay for me to work on this issue with some guidance?
Hi @JerryHue,
Would it be okay for me to work on this issue
Yes for sure. Your help would be very much appreciated.
with some guidance?
:+1: I don't know how familiar you are with the stack. Feel free to post questions here or via mail. Note that there is a CID implementation in Rust.
Hello again @mxinden!
Despite my absence, I haven't forgotten about this issue!
I believe I have implemented the changes as per the RFC stated, but I would like to make sure that I have not forgotten anything before I make my PR.
I would like to ask certain questions that have passed through my mind while working on this.
Previous public methods
Encoding to the string representation
As per the RFC requirements, the PeerId should still support the base58btc representation when printing it. This means that the Debug and Display traits use the base58 representation.
To encode the PeerId to the base32 representation, one may use to_base32.
Decoding from the string representation
This one was tougher to separate, since we can only implement the FromStr once. At the end, I just had them in the same function. So, that means we can use the same function to parse the following strings:
bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe,QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N,12D3KooWD3eckifWpRn9wQpMG9R9hX3sD158z7EqHWmweQAJU5SA.
I believe this fits the criteria of the RFC the best, and causes the least friction when upgrading.
to_bytes and from_bytes
I am not sure how to handle the to_bytes and the from_bytes functions. Right now, there are two versions of each: one version to handle the multihash representation (the original to_bytes and from_bytes), and another version to handle the cid representation.
I believe that the from_bytes function should behave very similarly to the implementation of the FromStr trait, but I am not sure if I should leave the to_bytes as two separate functions (to_bytes and to_bytes_cid), or having a single one that represents the raw byte representation of the CID. If I leave them separate, we would need to choose one of them as the default representation for the trait From<PeerId> for Vec<u8> to work.
This question is also related to the following snippet:
https://github.com/libp2p/rust-libp2p/blob/9f331e517f524dd112bf9e12b26d9dfae9162022/protocols/kad/src/protocol.rs#L102-L104
The comment associated is confusing me a little. What does it mean by "handling as a special case"? Isn't the new representation of the PeerId a CID? If so, why handle it as a special case?
Tests
Since this is a substantial PR, I would need to write some tests. Could you recommend me some behaviour to test whether it is working correctly? I am aware I need to implement the tests that confirm the behaviour stated in the RFC, but I am not sure if I should test other external places.
Sorry for the late reply here. I was sick for the last week.
Encoding to the string representation
As per the RFC requirements, the
PeerIdshould still support thebase58btcrepresentation when printing it. This means that theDebugandDisplaytraits use thebase58representation.To encode the
PeerIdto thebase32representation, one may useto_base32.
:+1: sounds good to me. In the future we would switch the Debug and Display implementations to use to_base_32, though that would happen in a coordinated fashion across the implementations.
Decoding from the string representation
This one was tougher to separate, since we can only implement the
FromStronce. At the end, I just had them in the same function. So, that means we can use the same function to parse the following strings:* `bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe`, * `QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N`, * `12D3KooWD3eckifWpRn9wQpMG9R9hX3sD158z7EqHWmweQAJU5SA`.I believe this fits the criteria of the RFC the best, and causes the least friction when upgrading.
In my eyes a single function that can parse both is the way to go.
to_bytesandfrom_bytesI am not sure how to handle the
to_bytesand thefrom_bytesfunctions. Right now, there are two versions of each: one version to handle themultihashrepresentation (the originalto_bytesandfrom_bytes), and another version to handle thecidrepresentation.I believe that the
from_bytesfunction should behave very similarly to the implementation of theFromStrtrait, but I am not sure if I should leave theto_bytesas two separate functions (to_bytesandto_bytes_cid), or having a single one that represents the raw byte representation of theCID. If I leave them separate, we would need to choose one of them as the default representation for the traitFrom<PeerId> for Vec<u8>to work.
I would suggest a single function for from_bytes, in line with your above suggestion for FromStr and two functionf for to_bytes (to_bytes and to_bytes_cid) with From<PeerId> for Vec<u8> using the old behaviour for now until we do the above mentioned coordinated upgrade.
This question is also related to the following snippet:
https://github.com/libp2p/rust-libp2p/blob/9f331e517f524dd112bf9e12b26d9dfae9162022/protocols/kad/src/protocol.rs#L102-L104
The comment associated is confusing me a little. What does it mean by "handling as a special case"? Isn't the new representation of the
PeerIdaCID? If so, why handle it as a special case?
I think the comment suggests to either support PeerID as CIDs, as you are implementing today, or add a hack in libp2p-kad that would support PeerIDs as CIDs in this special case only. With your work everything can happen within PeerId::from_bytes.
Tests
Since this is a substantial PR, I would need to write some tests. Could you recommend me some behaviour to test whether it is working correctly? I am aware I need to implement the tests that confirm the behaviour stated in the RFC, but I am not sure if I should test other external places.
It would be great to have some quickcheck tests (see e.g. ed25519.rs tests as an example) that test through the various from_XXX to_XXX methods.
While still minimal, we should as make sure we can parse the examples listed at the bottom of https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md.
Thank you for the very thorough work thus far @JerryHue!
Hello again, Max!
Thank you for your follow up! I believe I have finished my changes, so I will follow up with my PR shortly.
Thank you for the help and support!
I find this RFC 1 (that I discover for the first time) very questionable. It's extremely over-engineered.
But beyond that, I don't understand the motivation for storing a Cid instead of Multihash?
Modifying the internals of the PeerId struct just adds a couple of bytes to it, and these bytes are always the same. It's as if you were adding another field called prefix to the PeerId struct and that this field always has the same constant value.
Why not build the bytes on the fly when to_cidv1_bytes() or to_base32() are called? It seems that me that we're building the bytes on the fly anyway, and so storing a Cid instead of a Multihash doesn't bring anything to the table? Or am I mistaken?