design Non-canonical LEB128

We've discussed this before, but the spec doesn't mention it.

Dec 08 '16 22:12 jfbastien

Oh, I just noticed this actually is mentioned, including the maximum length:

A LEB128 variable-length integer, limited to N bits (i.e., the values [0, 2^N-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 bytes.

A Signed LEB128 variable-length integer, limited to N bits (i.e., the values [-2^(N-1), +2^(N-1)-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 or 0xFF bytes.

Dec 09 '16 08:12 binji

Yes, as @binji points out, this is already stated. No need to add anything.

Dec 09 '16 08:12 rossberg

That's half of the non-canonical stuff. It's missing the other half. And the maximum length. I can put everything in one section if that's clearer.

Dec 09 '16 17:12 jfbastien

What other half? And it has the maximum length part too (ceil(N/7) bytes). The only thing I see that's missing is that the padding will actually be 0 or 0x7f for the last byte.

Dec 09 '16 18:12 binji

"non-relevant LEB128 bits (bits past the size) are ignored"

Dec 09 '16 18:12 jfbastien

Ah, but I think that's incorrect. If we want to extend a varint (from varint32 to a varint64, say) in the future, then it's important that the bits past the size are a zero-extension (for LEB) or sign-extension (for SLEB) of the most significant bit.

Dec 09 '16 18:12 binji

Is it incorrect? That's the main reason I opened this :) The wikipedia entry were refer to so authoritatively leads me to believe past-the-end bits can be anything!

Dec 09 '16 18:12 jfbastien

Hm, I don't read it that way:

To encode an unsigned number using unsigned LEB128 first represent the number in binary. Then zero extend the number up to a multiple of 7 bits...

A signed number is represented similarly, except that the two's complement number is sign extended up to a multiple of 7 bits...

Since it's extended up to a multiple of 7 bits, it should only have zeroes or ones past the size.

Anyway, I think it's pretty important that the bits are not ignored. For example, in https://github.com/WebAssembly/design/pull/895 you changed the resizable_limits flags from a varuint32 to a varuint1. If we ignore the bits past the size, then we can't safely extend the value.

Dec 09 '16 18:12 binji

@jfbastien If I'm reading the varuintN and varintN sections correctly, they are already defining their respective types as a restriction of LEB128 and no bytes are ignored.

Dec 09 '16 18:12 lukewagner

@binji @lukewagner are you saying that non-canonical LEB128 can only be extra long (either zero or sign extended, up to max size), and that insignificant bits are and should be disallowed?

Dec 15 '16 23:12 jfbastien

@jfbastien, that was the intention of the text, and it still reads that way to me.

Dec 16 '16 07:12 rossberg

Agreed, assuming "insignificant" means "bits that otherwise wouldn't fit in the target uintN".

Dec 16 '16 22:12 lukewagner

@jfbastien So I think we can close this out now? Or perhaps you'd like to create PR that clarifies the wording?

Dec 20 '16 23:12 lukewagner

@lukewagner yeah I'll update the PR to clarify wording. Soon :)

Dec 20 '16 23:12 jfbastien

As discussed above, this is already documented, so there's no need to add anything.

Feb 22 '24 21:02 sunfishcode

design design copied to clipboard

Non-canonical LEB128

design
design copied to clipboard