rust-libp2p core/src/peer_id: Support and output peer IDs as CIDs (RFC 0001)

rust-libp2p should implement RFC 0001.

Tracking issue: https://github.com/libp2p/specs/issues/216

Also tackle:

https://github.com/libp2p/rust-libp2p/blob/9f331e517f524dd112bf9e12b26d9dfae9162022/protocols/kad/src/protocol.rs#L102-L104

Oct 01 '21 19:10 mxinden

@mxinden, I have spent a few hours reading over the several documents related to this.

Would it be okay for me to work on this issue with some guidance?

Oct 05 '21 23:10 JerryHue

Hi @JerryHue,

Would it be okay for me to work on this issue

Yes for sure. Your help would be very much appreciated.

with some guidance?

:+1: I don't know how familiar you are with the stack. Feel free to post questions here or via mail. Note that there is a CID implementation in Rust.

Oct 06 '21 16:10 mxinden

Hello again @mxinden!

Despite my absence, I haven't forgotten about this issue!

I believe I have implemented the changes as per the RFC stated, but I would like to make sure that I have not forgotten anything before I make my PR.

I would like to ask certain questions that have passed through my mind while working on this.

Previous public methods

Encoding to the string representation

As per the RFC requirements, the PeerId should still support the base58btc representation when printing it. This means that the Debug and Display traits use the base58 representation.

To encode the PeerId to the base32 representation, one may use to_base32.

Decoding from the string representation

This one was tougher to separate, since we can only implement the FromStr once. At the end, I just had them in the same function. So, that means we can use the same function to parse the following strings:

bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe,
QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N,
12D3KooWD3eckifWpRn9wQpMG9R9hX3sD158z7EqHWmweQAJU5SA.

I believe this fits the criteria of the RFC the best, and causes the least friction when upgrading.

`to_bytes` and `from_bytes`

I am not sure how to handle the to_bytes and the from_bytes functions. Right now, there are two versions of each: one version to handle the multihash representation (the original to_bytes and from_bytes), and another version to handle the cid representation.

I believe that the from_bytes function should behave very similarly to the implementation of the FromStr trait, but I am not sure if I should leave the to_bytes as two separate functions (to_bytes and to_bytes_cid), or having a single one that represents the raw byte representation of the CID. If I leave them separate, we would need to choose one of them as the default representation for the trait From<PeerId> for Vec<u8> to work.

This question is also related to the following snippet: https://github.com/libp2p/rust-libp2p/blob/9f331e517f524dd112bf9e12b26d9dfae9162022/protocols/kad/src/protocol.rs#L102-L104 The comment associated is confusing me a little. What does it mean by "handling as a special case"? Isn't the new representation of the PeerId a CID? If so, why handle it as a special case?

Tests

Since this is a substantial PR, I would need to write some tests. Could you recommend me some behaviour to test whether it is working correctly? I am aware I need to implement the tests that confirm the behaviour stated in the RFC, but I am not sure if I should test other external places.

Oct 20 '21 02:10 JerryHue

Sorry for the late reply here. I was sick for the last week.

Encoding to the string representation

As per the RFC requirements, the PeerId should still support the base58btc representation when printing it. This means that the Debug and Display traits use the base58 representation.

To encode the PeerId to the base32 representation, one may use to_base32.

:+1: sounds good to me. In the future we would switch the Debug and Display implementations to use to_base_32, though that would happen in a coordinated fashion across the implementations.

Decoding from the string representation

This one was tougher to separate, since we can only implement the FromStr once. At the end, I just had them in the same function. So, that means we can use the same function to parse the following strings:
* `bafzbeie5745rpv2m6tjyuugywy4d5ewrqgqqhfnf445he3omzpjbx5xqxe`,

* `QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N`,

* `12D3KooWD3eckifWpRn9wQpMG9R9hX3sD158z7EqHWmweQAJU5SA`.
I believe this fits the criteria of the RFC the best, and causes the least friction when upgrading.

In my eyes a single function that can parse both is the way to go.

to_bytes and from_bytes

I am not sure how to handle the to_bytes and the from_bytes functions. Right now, there are two versions of each: one version to handle the multihash representation (the original to_bytes and from_bytes), and another version to handle the cid representation.

I believe that the from_bytes function should behave very similarly to the implementation of the FromStr trait, but I am not sure if I should leave the to_bytes as two separate functions (to_bytes and to_bytes_cid), or having a single one that represents the raw byte representation of the CID. If I leave them separate, we would need to choose one of them as the default representation for the trait From<PeerId> for Vec<u8> to work.

I would suggest a single function for from_bytes, in line with your above suggestion for FromStr and two functionf for to_bytes (to_bytes and to_bytes_cid) with From<PeerId> for Vec<u8> using the old behaviour for now until we do the above mentioned coordinated upgrade.

This question is also related to the following snippet:

https://github.com/libp2p/rust-libp2p/blob/9f331e517f524dd112bf9e12b26d9dfae9162022/protocols/kad/src/protocol.rs#L102-L104

The comment associated is confusing me a little. What does it mean by "handling as a special case"? Isn't the new representation of the PeerId a CID? If so, why handle it as a special case?

I think the comment suggests to either support PeerID as CIDs, as you are implementing today, or add a hack in libp2p-kad that would support PeerIDs as CIDs in this special case only. With your work everything can happen within PeerId::from_bytes.

Tests

Since this is a substantial PR, I would need to write some tests. Could you recommend me some behaviour to test whether it is working correctly? I am aware I need to implement the tests that confirm the behaviour stated in the RFC, but I am not sure if I should test other external places.

It would be great to have some quickcheck tests (see e.g. ed25519.rs tests as an example) that test through the various from_XXX to_XXX methods.

While still minimal, we should as make sure we can parse the examples listed at the bottom of https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md.

Thank you for the very thorough work thus far @JerryHue!

Oct 26 '21 19:10 mxinden

Hello again, Max!

Thank you for your follow up! I believe I have finished my changes, so I will follow up with my PR shortly.

Thank you for the help and support!

Nov 01 '21 07:11 JerryHue

I find this RFC 1 (that I discover for the first time) very questionable. It's extremely over-engineered.

But beyond that, I don't understand the motivation for storing a Cid instead of Multihash? Modifying the internals of the PeerId struct just adds a couple of bytes to it, and these bytes are always the same. It's as if you were adding another field called prefix to the PeerId struct and that this field always has the same constant value.

Why not build the bytes on the fly when to_cidv1_bytes() or to_base32() are called? It seems that me that we're building the bytes on the fly anyway, and so storing a Cid instead of a Multihash doesn't bring anything to the table? Or am I mistaken?

Mar 02 '22 15:03 tomaka

core/src/peer_id: Support and output peer IDs as CIDs (RFC 0001)

Previous public methods

Encoding to the string representation

Decoding from the string representation

to_bytes and from_bytes

Tests

Encoding to the string representation

Decoding from the string representation

to_bytes and from_bytes

Tests

`to_bytes` and `from_bytes`

`to_bytes` and `from_bytes`