rust-multihash Remove `Code::Identity` support

The feature is technically opt in, but features are unified so the entire program will be "opted in" to supporting identity hashes if any component needs to support them.
It's ridiculously easy to end up panicing based on user inputs.

Possible solutions:

Make Code::digest return a result. This is likely a non-starter due to mass breakage and the number of unwraps it would introduce.
Make Cid dynamically allocated. Also a massive refactor, and a non-starter.
Remove it from the default Code.

Honestly, I think the best bet is to remove the feature, and maybe export an IDENTITY_CODE constant, or even a identity_hash free-function that returns a result. My reasoning is that the current identity "code" is pretty much unusable without special casing anyways, and is a massive foot-gun.

Mar 10 '22 23:03 Stebalien

cc @dignifiedquire @vmx

Mar 10 '22 23:03 Stebalien

The with removing the identity feature is that libp2p (https://github.com/libp2p/rust-libp2p/blob/5291011c73f8eaf44b3ba4ada873d888ff26e703/core/Cargo.toml#L26) and Forest (https://github.com/ChainSafe/forest/blob/e708777ff6d28aa08ff47560cd62212dfe2fbbb7/node/forest_libp2p/Cargo.toml#L24) are using it atm.

Mar 11 '22 09:03 vmx

My expectations for "power users" like the FVM were, that you wouldn't rely on the default code table, but you would create your own one, specific to the project. For example the default table is using 64 bytes allocations, I guess for the FVM 32 bytes would be enough.

Mar 11 '22 09:03 vmx

The with removing the identity feature is that libp2p (https://github.com/libp2p/rust-libp2p/blob/5291011c73f8eaf44b3ba4ada873d888ff26e703/core/Cargo.toml#L26) and Forest (https://github.com/ChainSafe/forest/blob/e708777ff6d28aa08ff47560cd62212dfe2fbbb7/node/forest_libp2p/Cargo.toml#L24) are using it atm.

My point is that the current implementation is horribly broken and nobody should use it. Basically, you either:

Special case it everywhere and check to make sure you're not feeding more than 64 bytes to the identity hash function. If you're doing this, there's little point in even providing a Code::Identity (rather than a simple make_identity_hash helper function that's actually safe).
Don't special case it. In this case, you're probably just going to crash.

My expectations for "power users" like the FVM were, that you wouldn't rely on the default code table, but you would create your own one, specific to the project. For example the default table is using 64 bytes allocations, I guess for the FVM 32 bytes would be enough.

Currently we just check "is this cbor" when storing blocks. But yes, eventually we're going to want a custom code table.

Mar 12 '22 00:03 Stebalien

I think it mostly comes down to how its expected to behave, if its supposed full to copy everything into itself, then thats not something that you are able to implement without allocating. (this does not fit in well the design of this crate)

if the expectation is that it copies as much as it can, we can truncate anything that doesnt fit inside of the the const sized array. its a simple fix if you just want to truncate

fn update(&mut self, input: &[u8]) {
    let src_end = (self.i + input.len()).min(self.bytes.len());
    let input_end = input.len().min(self.bytes.len() - self.i);
    self.bytes[self.i..src_end].copy_from_slice(&input[..input_end]);
    self.i = src_end;
}

assuming you want to have everything and you know the size is under S (but bigger than the usually allocated), you could use

let mh = Multihash::<128>::wrap(0x00, &src).unwrap();

though that could end in weird trait interop with the normally generated multihashes

Mar 12 '22 08:03 mriise

Special case it everywhere and check to make sure you're not feeding more than 64 bytes to the identity hash function. If you're doing this, there's little point in even providing a Code::Identity (rather than a simple make_identity_hash helper function that's actually safe).

To me the usual use case, that also libp2p is using is "is the data smaller than a certain size, use the data directly, else hash it". In that case you always check the size and can be sure it fits into your Cid.

The identity hash is something that needs to be used with care, hence it's behind a feature flag.

Mar 14 '22 09:03 vmx

To me the usual use case, that also libp2p is using is "is the data smaller than a certain size, use the data directly, else hash it". In that case you always check the size and can be sure it fits into your Cid.

Sure, but in that case, one could just call Multihash::wrap(0, ...).

The identity hash is something that needs to be used with care, hence it's behind a feature flag.

Unfortunately, feature flags are additive. That's what makes this so dangerous.

Mar 14 '22 15:03 Stebalien