Reserve 0 byte in all declarations and accesses
We've learned the lesson from both positive and negative examples that future proposals often need to add things to existing constructs. Where we've had the foresight to do so, a 0 byte has been extremely helpful.
I'd like to propose that all declarations in this proposal (i.e. arrays and structs) have an additional 0 byte in the binary encoding for future extensibility. We may also consider the question of whether instructions that access them should ever require extensibility, and thus also need a 0 byte.
The alternative would be to plan to use different type codes or operation codes for future extended versions. As far as I can see, the main drawback of this alternative is that according to our current conventions, that would mean we would have to come up with new names for the extended versions, while using a reserved zero byte would not require new names. But that consideration is mostly cosmetic. Does reserving zero bytes have other advantages over extension via new opcodes?
What effect would this have on code size?
Reserving zero bytes would be a strict increase in code size, of course. However, I think it would be small, on the order of a couple percent for a module that uses GC, even heavily. E.g. a struct declaration has at minimum an LEB followed by N type x mutable declarations. For, say, a struct with 4 fields, it will add 1 byte to a 9+ byte encoding. But empirically modules are 90% code. It would increase the size of a struct.get instruction (minimum 4 bytes) by 1 byte. We'd have to measure for gc-heavy programs what proportion of instructions those represent.
I think zero bytes have the advantage that it keeps the opcode space and declaration encoding space better organized. In decoders, its effect depends on exactly how the code is organized. For (in-place) interpreters, it is a tradeoff between having more entries / handler versions in a dispatch table versus having a branch or skipping the byte.
I measured the code size impact of adding a zero byte to every type declaration, adding a zero byte to every {struct,array}.{get*,set}, and adding both:
| uncompressed | uncompressed ratio | gzip | gzip ratio | brotli | brotli ratio | |
|---|---|---|---|---|---|---|
| no reserved bytes | 7523115 | 1 | 1935289 | 1 | 1291117 | 1 |
| type declaration byte | 7546501 | 1.003108553 | 1936486 | 1.000618512 | 1291685 | 1.000439929 |
| accessor byte | 7661635 | 1.018412586 | 1944626 | 1.004824602 | 1294665 | 1.002748008 |
| declaration + accessor bytes | 7685021 | 1.021521139 | 1945847 | 1.005455516 | 1296033 | 1.003807556 |
As expected, the code size impact is small. That being said, I would prefer to introduce new opcodes for future types and instructions rather than using reserved bytes. There's also the problem of maintaining backward compatibility for function type declarations, which do not have reserved bytes.
Reflecting on this, a zero byte might be a bit clunky, and we are generally inconsistent on using them these days. I'm OK closing this issue.
Thanks to you both, @tlively, @titzer, I'm closing.