design
design copied to clipboard
Non-canonical LEB128
We've discussed this before, but the spec doesn't mention it.
Oh, I just noticed this actually is mentioned, including the maximum length:
A LEB128 variable-length integer, limited to N bits (i.e., the values [0, 2^N-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 bytes.
A Signed LEB128 variable-length integer, limited to N bits (i.e., the values [-2^(N-1), +2^(N-1)-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 or 0xFF bytes.
Yes, as @binji points out, this is already stated. No need to add anything.
That's half of the non-canonical stuff. It's missing the other half. And the maximum length. I can put everything in one section if that's clearer.
What other half? And it has the maximum length part too (ceil(N/7)
bytes). The only thing I see that's missing is that the padding will actually be 0 or 0x7f for the last byte.
"non-relevant LEB128 bits (bits past the size) are ignored"
Ah, but I think that's incorrect. If we want to extend a varint (from varint32 to a varint64, say) in the future, then it's important that the bits past the size are a zero-extension (for LEB) or sign-extension (for SLEB) of the most significant bit.
Is it incorrect? That's the main reason I opened this :) The wikipedia entry were refer to so authoritatively leads me to believe past-the-end bits can be anything!
Hm, I don't read it that way:
To encode an unsigned number using unsigned LEB128 first represent the number in binary. Then zero extend the number up to a multiple of 7 bits...
A signed number is represented similarly, except that the two's complement number is sign extended up to a multiple of 7 bits...
Since it's extended up to a multiple of 7 bits, it should only have zeroes or ones past the size.
Anyway, I think it's pretty important that the bits are not ignored. For example, in https://github.com/WebAssembly/design/pull/895 you changed the resizable_limits flags from a varuint32 to a varuint1. If we ignore the bits past the size, then we can't safely extend the value.
@jfbastien If I'm reading the varuintN
and varintN
sections correctly, they are already defining their respective types as a restriction of LEB128 and no bytes are ignored.
@binji @lukewagner are you saying that non-canonical LEB128 can only be extra long (either zero or sign extended, up to max size), and that insignificant bits are and should be disallowed?
@jfbastien, that was the intention of the text, and it still reads that way to me.
Agreed, assuming "insignificant" means "bits that otherwise wouldn't fit in the target uintN
".
@jfbastien So I think we can close this out now? Or perhaps you'd like to create PR that clarifies the wording?
@lukewagner yeah I'll update the PR to clarify wording. Soon :)
As discussed above, this is already documented, so there's no need to add anything.