gc icon indicating copy to clipboard operation
gc copied to clipboard

Reserve 0 byte in all declarations and accesses

Open titzer opened this issue 4 years ago • 3 comments

We've learned the lesson from both positive and negative examples that future proposals often need to add things to existing constructs. Where we've had the foresight to do so, a 0 byte has been extremely helpful.

I'd like to propose that all declarations in this proposal (i.e. arrays and structs) have an additional 0 byte in the binary encoding for future extensibility. We may also consider the question of whether instructions that access them should ever require extensibility, and thus also need a 0 byte.

titzer avatar Jun 02 '21 15:06 titzer

The alternative would be to plan to use different type codes or operation codes for future extended versions. As far as I can see, the main drawback of this alternative is that according to our current conventions, that would mean we would have to come up with new names for the extended versions, while using a reserved zero byte would not require new names. But that consideration is mostly cosmetic. Does reserving zero bytes have other advantages over extension via new opcodes?

tlively avatar Jun 02 '21 17:06 tlively

What effect would this have on code size?

fgmccabe avatar Jun 02 '21 17:06 fgmccabe

Reserving zero bytes would be a strict increase in code size, of course. However, I think it would be small, on the order of a couple percent for a module that uses GC, even heavily. E.g. a struct declaration has at minimum an LEB followed by N type x mutable declarations. For, say, a struct with 4 fields, it will add 1 byte to a 9+ byte encoding. But empirically modules are 90% code. It would increase the size of a struct.get instruction (minimum 4 bytes) by 1 byte. We'd have to measure for gc-heavy programs what proportion of instructions those represent.

I think zero bytes have the advantage that it keeps the opcode space and declaration encoding space better organized. In decoders, its effect depends on exactly how the code is organized. For (in-place) interpreters, it is a tradeoff between having more entries / handler versions in a dispatch table versus having a branch or skipping the byte.

titzer avatar Jun 02 '21 22:06 titzer

I measured the code size impact of adding a zero byte to every type declaration, adding a zero byte to every {struct,array}.{get*,set}, and adding both:

uncompressed uncompressed ratio gzip gzip ratio brotli brotli ratio
no reserved bytes 7523115 1 1935289 1 1291117 1
type declaration byte 7546501 1.003108553 1936486 1.000618512 1291685 1.000439929
accessor byte 7661635 1.018412586 1944626 1.004824602 1294665 1.002748008
declaration + accessor bytes 7685021 1.021521139 1945847 1.005455516 1296033 1.003807556

As expected, the code size impact is small. That being said, I would prefer to introduce new opcodes for future types and instructions rather than using reserved bytes. There's also the problem of maintaining backward compatibility for function type declarations, which do not have reserved bytes.

tlively avatar Mar 03 '23 19:03 tlively

Reflecting on this, a zero byte might be a bit clunky, and we are generally inconsistent on using them these days. I'm OK closing this issue.

titzer avatar Mar 04 '23 02:03 titzer

Thanks to you both, @tlively, @titzer, I'm closing.

rossberg avatar Mar 04 '23 10:03 rossberg