multihash icon indicating copy to clipboard operation
multihash copied to clipboard

Capture payload size in the multihash

Open Gozala opened this issue 10 months ago • 4 comments

Almost in all instance where we use raw multihash we find ourselves capturing payload size an the side. It is also probably worth calling the fact that somewhat recently fr32-sha2-256-trunc254-padded-binary-tree multihash was defined to capture payload size to address potential vulnerabilities.

Given how common it is to want to capture payload size I would like to propose "multihash multihash" format that is multihash variant that uses multihash code 0x31 and encodes information about payload size and digest. Here is the exact format I'd like to propose

Format

<0x31><varint payload size in bytes><varint hash function code><varint digest size in bytes><hash function output>

FAQ

  • Should CIDs adopt this multihash format instead of what they use now ?

    I don't have use case for that unless anyone already has one I'd say lets not until we do have one. Also adopting it in CID would make their size arbitrary which can introduce various problems

  • Should it be possible for CIDs to use this multihash format ?

    I don't see why not. They could use whatever hashing algorithm they want so it make sense to do the same here

  • Should blockstore keys use this format or should they be unwrapping and using inner multihash ?

    I think block stores do not need to capture size in the key, which probably means they should not use this format to avoid duplication ?

Gozala avatar Apr 05 '24 17:04 Gozala

I should note that it was suggested to me to create a PR for this repo and perhaps call this multihash v2, however as per FAQ I don't feel like using it everywhere we use multihash is better not to mention pain of upgrade it would introduce. That said I think it is good idea to have a format for a fairly common (at least in my experience) use case that can be recommended in place of sidecar size field.

If there is both support and desire to make this into a real think I can take write something more formal, but even then could use some feedback in regards where description of this document should live and what format should it have.

Gozala avatar Apr 05 '24 17:04 Gozala

Can't the digest size be deduced? This would cleanup all the space that the multi-output hashes are taking up like blake2s and skein

BHare1985 avatar Aug 20 '24 18:08 BHare1985

The digest size currently specifies truncation. For some hash functions (e.g., blake3), smaller digests are prefixes of larger digests so we only need one code. However, for hash functions like blake2b, different sizes produce entirely different digests.

Stebalien avatar Aug 27 '24 15:08 Stebalien

The digest size currently specifies truncation. For some hash functions (e.g., blake3), smaller digests are prefixes of larger digests so we only need one code. However, for hash functions like blake2b, different sizes produce entirely different digests.

I understand, and that information is redundant if there is a payload size because you can deduce the hash size from the payload size.

BHare1985 avatar Aug 27 '24 15:08 BHare1985