js-multiformats icon indicating copy to clipboard operation
js-multiformats copied to clipboard

Proposal: Reconsider .code / .name fields of BlockEncoder / BlockDecoder

Open Gozala opened this issue 2 years ago • 0 comments

I find .code filed and .name (to lesser degree) fields on following interfaces to be troublesome

https://github.com/multiformats/js-multiformats/blob/9bcd7fef62888d7cefe8e4f5e929d4e3c9dadda9/src/codecs/interface.ts#L4-L15

Problem is that it prevents one from defining codec composition without introducing subtle footgun. For example dag-ucan in theory could be composition of dag-cbor and raw codecs, meaning it could decode block in either cbor or raw encoding and similarly encode node either in cbor or raw representation (depending on UCAN specific nuances).

This double representation is an implementation detail currently hidden under new 0x78c0 multicodec code https://github.com/multiformats/multicodec/pull/264.

Given the arguments in the thread I have considered dropping new code and make an implementation that is UCAN specialized BlockCodec<0x71|0x55> codec. However there are some interesting challenges:

  1. .code could be either 0x71 or 0x55, while type checker would be happy with either option it is misleading because it is common to use that code field when creating cids e.g.: https://github.com/multiformats/js-multiformats/blob/9bcd7fef62888d7cefe8e4f5e929d4e3c9dadda9/src/block.js#L148-L150
  2. I think this is a symptom of a broader problem I've experienced in different contexts. Result of encode carries no information about codec. Probably why I find myself resorting to { code, bytes } whenever I want to defer async CID creation.
    • It retrospect it seems silly that we identified need for this in MultihashDigest but not here https://github.com/multiformats/js-multiformats/blob/9bcd7fef62888d7cefe8e4f5e929d4e3c9dadda9/src/hashes/interface.ts#L12-L32

Unfortunately I see no way to address this in backwards compatible manner. Maybe we could introduce MultiblockEncoder along the side of BlockEncoder similar to how we have MultibaseEncoder producing prefixed values and BaseEncoder without prefix:

https://github.com/multiformats/js-multiformats/blob/9bcd7fef62888d7cefe8e4f5e929d4e3c9dadda9/src/bases/interface.ts#L4-L14

https://github.com/multiformats/js-multiformats/blob/9bcd7fef62888d7cefe8e4f5e929d4e3c9dadda9/src/bases/interface.ts#L48-L62


Maybe this is even broader issue of having multicodes in address as opposed to data itself. E.g if we tagged encoded bytes themself with multihash all the IR representations would naturally be represented although that ship has probably sailed a long ago.

Gozala avatar Apr 20 '22 01:04 Gozala