rust-libp2p icon indicating copy to clipboard operation
rust-libp2p copied to clipboard

Support customizing function that transforms `PeerId` into `Key<PeerId>` in Kademlia

Open nazar-pc opened this issue 3 years ago • 12 comments

Description

This is a minimal set of changes to Kademlia that allows both raw keys and customization of how PeerId is transformed into Key<PeerId>.

Raw Key and KeyBytes can be created from any bytes using ::unchecked_from() methods.

Links to any relevant issues

https://github.com/libp2p/rust-libp2p/issues/2690 https://github.com/libp2p/specs/issues/421

Open Questions

How to make these changes to not regress in the future given current function is just fn(PeerId) -> Key<PeerId> and mixing different mechanisms (when user provided one, but some Kademlia changes hardcoded Key::from) is possible without compilation errors, but on logical level things will break.

Solution to that is to have a separate trait and use Box<dyn (PeerId) -> Key<PeerId>> instead, but even then nothing prevents something from creating two Key<PeerId> using different methods that are logically incompatible.

Change checklist

  • [x] I have performed a self-review of my own code
  • [x] I have made corresponding changes to the documentation
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [x] A changelog entry has been made in the appropriate crates

nazar-pc avatar Jun 15 '22 09:06 nazar-pc

How to make these changes to not regress in the future given current function is just fn(PeerId) -> Key<PeerId> and mixing different mechanisms (when user provided one, but some Kademlia changes hardcoded Key::from) is possible without compilation errors, but on logical level things will break.

Solution to that is to have a separate trait and use Box<dyn (PeerId) -> Key<PeerId>> instead, but even then nothing prevents something from creating two Key<PeerId> using different methods that are logically incompatible.

The entire behaviour (and all other types) could be parameterized with a type that needs to implement a certain trait which provides this functionality. That would enforce that the same functionality is used everywhere because the type parameter can be passed on. To avoid breaking changes, we could default it to the current strategy of hashing the key.

thomaseizinger avatar Jun 15 '22 17:06 thomaseizinger

Yes, I thought about that, but it is quite infectious and makes changes more invasive. I'll hack on this in a local branch and see where it goes, I really like compile-time guarantees for these kinds of things.

nazar-pc avatar Jun 15 '22 18:06 nazar-pc

Pushed invasive changes that make Key and KeyBytes more generic and allow proper overrides for the way various data structures are converted into them.

I'm not very happy with generic name C, but it made sense to me. If someone has a better name I can go through the changes and rename it, though it is a very tedious job.

nazar-pc avatar Jun 16 '22 20:06 nazar-pc

I think this is ready for review

nazar-pc avatar Jun 17 '22 14:06 nazar-pc

I don't understand how the current approach prevents misuse at the type level? Would you mind paraphrasing this once more?

It physically doesn't compile. Second argument in Key<P, C> should match everywhere up to Behaviour. If it doesn't, it doesn't compile, so you can't accidentally specify incorrect key anymore.

On the downside, I find the current approach to complicate the already somewhat complicated key handling.

Agree completely. I tried to simplify it a bit by making Key more explicit in some places and removing From/Into transformations, such that Key::new() is the only entrypoint that needs to be implemented in Kademlia, but overall it got significantly complicated and I don't currently see a way to make it less complicated while keeping the same feature set.

Would the following approach be an option as well?

It is not just about PeerId, it is about anything that can be turned into Key. Depending on the protocol it might be desirable to hash or not has identity multihash for instance, or have custom behavior for multihashes from custom multicodec namespace of the protocol, like we have. Ability to fully define those transformations is thus beneficial (in our protocol we'll probably not want to hash the whole digest in most cases, rather we'll inspect and hash internal payload).

Just so that there is no confusion, you are not planning to have a network with both nodes that hash and nodes that don't hash, correct?

Certainly. Though it is possible to have 2 Kademlia instances that behave differently at the same time if there is a need for that.

nazar-pc avatar Jun 22 '22 11:06 nazar-pc

Would the following approach be an option as well?

It is not just about PeerId, it is about anything that can be turned into Key.

Would the approach I suggest above not cater for everything that can be turned into a Key? You could do get_closest_peers with a PeerId, but also with whatever type you have, as long as that type knows how to transform itself into a Key.

mxinden avatar Jun 27 '22 06:06 mxinden

Key created not just from PeerId, for instance there is conversion from Vec<u8> in https://github.com/nazar-pc/rust-libp2p/blob/d3d93d0c849c841423a942ff0057615480bb13ba/protocols/kad/src/behaviour.rs#L2005 and similarly there are conversions form record::Key in some places. There should be a way to intercept those and see if it is a multihash, if so then which one and do hashing or no hashing if it is hashed already according to user-defined rules.

Having this generic throughout the whole thing is inconvenient, but it is the only way I see to ensure all conversions go through the same mechanism consistently and we don't have Key<PeerId, Sha256Hash> mixed with Key<Vec<u8>, BlakeTwo256> for instance since those are, obviously, very different and if Vec<u8> happens to be PeerId's digest, Kademlia will not be able to find correct closest peers.

nazar-pc avatar Jun 27 '22 18:06 nazar-pc

Does latest explanation make sense? I'd really like for this to not stall for months or longer.

nazar-pc avatar Jul 10 '22 11:07 nazar-pc

Key created not just from PeerId, for instance there is conversion from Vec<u8> in https://github.com/nazar-pc/rust-libp2p/blob/d3d93d0c849c841423a942ff0057615480bb13ba/protocols/kad/src/behaviour.rs#L2005 and similarly there are conversions form record::Key in some places. There should be a way to intercept those and see if it is a multihash, if so then which one and do hashing or no hashing if it is hashed already according to user-defined rules.

Thanks for the explanations. This makes sense to me now.

Having this generic throughout the whole thing is inconvenient,

At this point I don't think supporting this feature, i.e. key type without additional hashing, is worth the complexity this pull request adds to the code base. I vote against merging here unless we find an alternative solution. That could e.g. be a simpler way to implement it (in case that exists), or a way to expose functionality of libp2p-kad in a way that you can reuse most of it but provide your own glue-code.

mxinden avatar Jul 11 '22 03:07 mxinden

At this point I don't think supporting this feature, i.e. key type without additional hashing

It is not really about "without additional hashing", it is about defining how to hash: which hash function to use, whether to hash the whole digest or just the payload, etc. We will use hashing in our protocol, but the kind of hashing we're doing shouldn't be dictated by libp2p, it is too opinionated.

nazar-pc avatar Jul 11 '22 11:07 nazar-pc

At this point I don't think supporting this feature, i.e. key type without additional hashing

It is not really about "without additional hashing", it is about defining how to hash: which hash function to use, whether to hash the whole digest or just the payload, etc. We will use hashing in our protocol, but the kind of hashing we're doing shouldn't be dictated by libp2p, it is too opinionated.

Thanks for the clarification. I am still arriving at the same conclusion, namely that unless we find a simpler way or a way to expose subsets of libp2p-kad, I am voting against merging here.

mxinden avatar Jul 14 '22 04:07 mxinden

At this point I don't think supporting this feature, i.e. key type without additional hashing

It is not really about "without additional hashing", it is about defining how to hash: which hash function to use, whether to hash the whole digest or just the payload, etc. We will use hashing in our protocol, but the kind of hashing we're doing shouldn't be dictated by libp2p, it is too opinionated.

Thanks for the clarification. I am still arriving at the same conclusion, namely that unless we find a simpler way or a way to expose subsets of libp2p-kad, I am voting against merging here.

Perhaps we can do something similar as was done for yamux: Extract a library that is independent of libp2p so people can create their own NetworkBehaviour.

I am not too familiar with Kademlia but in general, rust-libp2p has an opinionated interface and some usecases are not worth supporting by extending that interface. We can lower the cost of writing your own NetworkBehaviour by extracting a libp2p-independent kademlia library. The only question is, whether it will be feasible for such a library to support changing the hash function.

thomaseizinger avatar Jul 21 '22 06:07 thomaseizinger

Setting this to Draft until we make a decision.

thomaseizinger avatar Nov 02 '22 04:11 thomaseizinger

I think this is quite unlikely to land, I'll close it for now

nazar-pc avatar Nov 02 '22 04:11 nazar-pc