multibase Base 2, base 8, and base 10

The odd-balls in the current multibase spec are:

Base 2
Base 8
Base 10

That is, these are generally considered less useful than the other bases. The current situation is:

Base 2 is useful for bitfields.
One of base 8 or base 10 may be useful when only digits (0-9) are allowed.
- Base 10 has a spec.
- Base 10 is a more compact.
- Base 8 may be simpler to decode/encode.

The question is: which of these should we keep, if any? This is relevant to https://github.com/multiformats/go-multibase/pull/26 as, if we keep base 8, we need to define and implement it.

Aug 01 '19 17:08 Stebalien

I'm in favour of reducing the burden on implementers. If it turns out that there's a base encoding that isn't part of the spec yet, we can add it later on. I'm for starting with a valuable small set of things and expand if needed (which might never be needed).

Aug 02 '19 10:08 vmx

@vmx, as a C# implementor, I unilaterally decided not to implement these bases. See https://github.com/richardschneider/net-ipfs-core/issues/54

Aug 05 '19 06:08 richardschneider

Base8 can encode/decode more efficiently. (Computationally efficient for large data) Base10 uses less space but is more expensive to encode/decode. (Space efficient)

I would say both should be kept and a Base8 spec added.

Oct 02 '19 03:10 fabianhjr

Note for those following along. While go-multibase never gained a base8 encoding implementation, js-multibase is about to get fully-baked support for this. Notably from https://github.com/multiformats/js-multibase/pull/55#issue-427355352

Note: base8 deviates from the spec tests outputs but aligns with multiformats/multibase#60

We should really make a decision here, and at least fix the shared-test-vectors to include only parts we expect implementations to support.

For easy-to-eyeball reference the current path taken by js-ipfs is: https://github.com/multiformats/js-multibase/blob/c8f762996e47403c0c41c4f16c35c7b252c4f31e/src/constants.js#L14-L39

refs:

Stalled Base-8 spec: https://github.com/multiformats/multibase/pull/60
Stalled go-multibase Base-8 implementaion: https://github.com/multiformats/go-multibase/pull/26
C# decision: https://github.com/richardschneider/net-ipfs-core/issues/54
Rust: ???

/cc @vmx @rvagg @lidel @hugomrdias @creationix

Jun 03 '20 18:06 ribasushi

Are there actually use cases where only decimal digits but a large or arbitrary number of digits is allowed?

I know of lots of places that store integers, but those have limits on size typically 32 or 64 bits which is way too small for hashes.

Jun 03 '20 18:06 creationix

Same question as @creationix -- I find these bases only interesting for academic purposes and would love to know what real-world use-cases there might be, are we just doing completeness for completeness' sake?

Regarding the specific question, +1 on adopting what JS is doing now. The approach to base8 is consistent with the other bases so I think the change is correct and the test fixtures should change.

Jun 04 '20 04:06 rvagg

I think I have some time to move #60 along. sorry for stalling Q_Q

Jun 04 '20 21:06 fabianhjr

I think I have some time to move #60 along. sorry for stalling Q_Q

No worries @fabianhjr! May I suggest pivoting a bit and reframing the spec into a generic "rfc4648-derived" spec covering base8, base16, base32 and base64? This way you can both abstractly define padded/non-padded variants and we can still get away by defining just the types we want implementations to support.

As a logical step 2 the base36 spec could be reworked into "base-X spec" to define base10,base36 and base58.

This code-block puts in perspective what I mean by "let's just have 2 generic specs": https://github.com/multiformats/js-multibase/blob/c8f762996e47403c0c41c4f16c35c7b252c4f31e/src/constants.js#L14-L39

Jun 04 '20 21:06 ribasushi

@ribasushi, pushed some changes to leave the simple mapping and mention it as RFC4648 derived.

Jun 04 '20 21:06 fabianhjr

Given that base2, base8 and base10 and base16 are all common bases for number literals, it would be good to have common behaviour when decoding non-canonical strings. As far as I understand, things currently stand as follows:

base2, base10 explicitly preserve leading zeros and encode/decode the trailing data;
base8 drops the last incomplete word;
base16 is somewhat ambiguous, because rfc4648 Section 3.5 does not mandate a specific behaviour for decoders.

In my opinion, the expected behaviour for these encodings should be the same: preserve leading zeros, then encode/decode using the given base (and choice of alphabet for digits). This is effectively the same as zero-padding bits to the left, and is the same behaviour as base36 and base58.

However, base16 is described by rfc4648 Section 8 as being analogous to base32 and base64, and the latter both mandate zero-padding of bits to the right when necessary to complete a bit group in encoding. Furthermore, rfc4648 Section 3.5 mentions that encodings with non-zero bit padding MAY be rejected by decoders, from which one might deduce that the intended behaviour for decoders was also to consider zero-padding of bits to the right. However, this is not explicitly mandated, as far as I understand.

The above suggests three possible choices when decoding odd-length strings in base16:

reject them, as done by the the base64 module of Python;
zero-pad them to the right, as rfc4648 might seem to indicate;
zero-pad them to the left, analogously to base2 and base10 (which should also be what base8 does, IMHO).

Oct 15 '21 11:10 sg495

multibase multibase copied to clipboard

Base 2, base 8, and base 10

multibase
multibase copied to clipboard