Remove `Code::Identity` support
- The feature is technically opt in, but features are unified so the entire program will be "opted in" to supporting identity hashes if any component needs to support them.
- It's ridiculously easy to end up panicing based on user inputs.
Possible solutions:
- Make
Code::digestreturn a result. This is likely a non-starter due to mass breakage and the number ofunwrapsit would introduce. - Make
Ciddynamically allocated. Also a massive refactor, and a non-starter. - Remove it from the default
Code.
Honestly, I think the best bet is to remove the feature, and maybe export an IDENTITY_CODE constant, or even a identity_hash free-function that returns a result. My reasoning is that the current identity "code" is pretty much unusable without special casing anyways, and is a massive foot-gun.
cc @dignifiedquire @vmx
The with removing the identity feature is that libp2p (https://github.com/libp2p/rust-libp2p/blob/5291011c73f8eaf44b3ba4ada873d888ff26e703/core/Cargo.toml#L26) and Forest (https://github.com/ChainSafe/forest/blob/e708777ff6d28aa08ff47560cd62212dfe2fbbb7/node/forest_libp2p/Cargo.toml#L24) are using it atm.
My expectations for "power users" like the FVM were, that you wouldn't rely on the default code table, but you would create your own one, specific to the project. For example the default table is using 64 bytes allocations, I guess for the FVM 32 bytes would be enough.
The with removing the identity feature is that libp2p (https://github.com/libp2p/rust-libp2p/blob/5291011c73f8eaf44b3ba4ada873d888ff26e703/core/Cargo.toml#L26) and Forest (https://github.com/ChainSafe/forest/blob/e708777ff6d28aa08ff47560cd62212dfe2fbbb7/node/forest_libp2p/Cargo.toml#L24) are using it atm.
My point is that the current implementation is horribly broken and nobody should use it. Basically, you either:
- Special case it everywhere and check to make sure you're not feeding more than 64 bytes to the identity hash function. If you're doing this, there's little point in even providing a
Code::Identity(rather than a simplemake_identity_hashhelper function that's actually safe). - Don't special case it. In this case, you're probably just going to crash.
My expectations for "power users" like the FVM were, that you wouldn't rely on the default code table, but you would create your own one, specific to the project. For example the default table is using 64 bytes allocations, I guess for the FVM 32 bytes would be enough.
Currently we just check "is this cbor" when storing blocks. But yes, eventually we're going to want a custom code table.
I think it mostly comes down to how its expected to behave, if its supposed full to copy everything into itself, then thats not something that you are able to implement without allocating. (this does not fit in well the design of this crate)
if the expectation is that it copies as much as it can, we can truncate anything that doesnt fit inside of the the const sized array. its a simple fix if you just want to truncate
fn update(&mut self, input: &[u8]) {
let src_end = (self.i + input.len()).min(self.bytes.len());
let input_end = input.len().min(self.bytes.len() - self.i);
self.bytes[self.i..src_end].copy_from_slice(&input[..input_end]);
self.i = src_end;
}
assuming you want to have everything and you know the size is under S (but bigger than the usually allocated), you could use
let mh = Multihash::<128>::wrap(0x00, &src).unwrap();
though that could end in weird trait interop with the normally generated multihashes
- Special case it everywhere and check to make sure you're not feeding more than 64 bytes to the identity hash function. If you're doing this, there's little point in even providing a
Code::Identity(rather than a simplemake_identity_hashhelper function that's actually safe).
To me the usual use case, that also libp2p is using is "is the data smaller than a certain size, use the data directly, else hash it". In that case you always check the size and can be sure it fits into your Cid.
The identity hash is something that needs to be used with care, hence it's behind a feature flag.
To me the usual use case, that also libp2p is using is "is the data smaller than a certain size, use the data directly, else hash it". In that case you always check the size and can be sure it fits into your Cid.
Sure, but in that case, one could just call Multihash::wrap(0, ...).
The identity hash is something that needs to be used with care, hence it's behind a feature flag.
Unfortunately, feature flags are additive. That's what makes this so dangerous.