design icon indicating copy to clipboard operation
design copied to clipboard

Non-canonical LEB128

Open jfbastien opened this issue 8 years ago • 14 comments

We've discussed this before, but the spec doesn't mention it.

jfbastien avatar Dec 08 '16 22:12 jfbastien

Oh, I just noticed this actually is mentioned, including the maximum length:

A LEB128 variable-length integer, limited to N bits (i.e., the values [0, 2^N-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 bytes.

A Signed LEB128 variable-length integer, limited to N bits (i.e., the values [-2^(N-1), +2^(N-1)-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 or 0xFF bytes.

binji avatar Dec 09 '16 08:12 binji

Yes, as @binji points out, this is already stated. No need to add anything.

rossberg avatar Dec 09 '16 08:12 rossberg

That's half of the non-canonical stuff. It's missing the other half. And the maximum length. I can put everything in one section if that's clearer.

jfbastien avatar Dec 09 '16 17:12 jfbastien

What other half? And it has the maximum length part too (ceil(N/7) bytes). The only thing I see that's missing is that the padding will actually be 0 or 0x7f for the last byte.

binji avatar Dec 09 '16 18:12 binji

"non-relevant LEB128 bits (bits past the size) are ignored"

jfbastien avatar Dec 09 '16 18:12 jfbastien

Ah, but I think that's incorrect. If we want to extend a varint (from varint32 to a varint64, say) in the future, then it's important that the bits past the size are a zero-extension (for LEB) or sign-extension (for SLEB) of the most significant bit.

binji avatar Dec 09 '16 18:12 binji

Is it incorrect? That's the main reason I opened this :) The wikipedia entry were refer to so authoritatively leads me to believe past-the-end bits can be anything!

jfbastien avatar Dec 09 '16 18:12 jfbastien

Hm, I don't read it that way:

To encode an unsigned number using unsigned LEB128 first represent the number in binary. Then zero extend the number up to a multiple of 7 bits...

A signed number is represented similarly, except that the two's complement number is sign extended up to a multiple of 7 bits...

Since it's extended up to a multiple of 7 bits, it should only have zeroes or ones past the size.

Anyway, I think it's pretty important that the bits are not ignored. For example, in https://github.com/WebAssembly/design/pull/895 you changed the resizable_limits flags from a varuint32 to a varuint1. If we ignore the bits past the size, then we can't safely extend the value.

binji avatar Dec 09 '16 18:12 binji

@jfbastien If I'm reading the varuintN and varintN sections correctly, they are already defining their respective types as a restriction of LEB128 and no bytes are ignored.

lukewagner avatar Dec 09 '16 18:12 lukewagner

@binji @lukewagner are you saying that non-canonical LEB128 can only be extra long (either zero or sign extended, up to max size), and that insignificant bits are and should be disallowed?

jfbastien avatar Dec 15 '16 23:12 jfbastien

@jfbastien, that was the intention of the text, and it still reads that way to me.

rossberg avatar Dec 16 '16 07:12 rossberg

Agreed, assuming "insignificant" means "bits that otherwise wouldn't fit in the target uintN".

lukewagner avatar Dec 16 '16 22:12 lukewagner

@jfbastien So I think we can close this out now? Or perhaps you'd like to create PR that clarifies the wording?

lukewagner avatar Dec 20 '16 23:12 lukewagner

@lukewagner yeah I'll update the PR to clarify wording. Soon :)

jfbastien avatar Dec 20 '16 23:12 jfbastien

As discussed above, this is already documented, so there's no need to add anything.

sunfishcode avatar Feb 22 '24 21:02 sunfishcode