go-algorand
go-algorand copied to clipboard
Add Switch Opcode
Problem
One of the goals of the AVM is to allow efficient encoding of Smart Contracts - the programs themselves should take little space on the blockchain. The design of the opcode set should, when possible, encode common usage patterns into few opcodes.
Currently, there is no native computed jump or switch statement functionality in the AVM opcodes. It is common for users to want to jump to a part of the code based on a value, but currently expensive (in terms of program length) to write.
Solution
(design in-progress)
Introduce a new opcode: switch1 label0 label1 ... labelN
This opcode will pop the top stack value X and jump to position labelX.
If X > N, the program exits with an error.
out-of-scope: a follow-up story will consider switch2, the more complex case where the value to switch on can be defined by the user.
This opcode will be encoding several offsets, and those offsets are likely to be quite small. We should consider encoding them as varints, so they consume only one byte when they represent a short jump.
This is a fairly common optimization in the AVM bytecode, however we have not done it for branch targets yet. In all other similar cases, we encode a varuint, but here we need varint, because the jump may go backwards.
Separately, I had previously felt the labels should be encoded as absolute jumps. I no longer think that's a good idea, since a) it would limit this optimization, and b) we would then lose a convenient property we currently have for AVM bytecode - it can be shifted with breaking it, since all jumps are relative.
Finally, these offsets should probably be offsets from the start of the switch opcode if they go backwards, and from the end if they go forward (see next paragraph for why). This is in contrast to the b* opcodes which use an offset from the next instruction.
Since assembly is two-phase, and we need to patch in offsets in phase-two, it will be a little annoying if the patch can be one or two bytes in length. If we did a "normal" offset from a specific point, the offsets would actually change based on how much space they consume. And, in fact, all of the offsets would change when any encoding changes size, which could mean they change size, which might cause more changes. But, if negative values count back from the start of the instruction, and positive values count forward from the end, the offset values do not change based on the size of the instruction (and the offset encodings).
It's up for debate whether the opcode should be encoded as switch <# offsets> label1 label2... or switch label1 label2 ... 0
That is, should we encode the number of cases, or use a 0 to terminate them. In similar instructions, like intcblock, we encode the length, so that's probably easiest. Use a varuint, not a byte, just as those opcodes do, so we can support more than 256 labels.
It might be more flexible/useful to have a "default" clause, so if the input value falls out of the expected range then it branches to some offset, where the program can decide if to fault or not. eg switch1 label0 label1 ... labelN default labelD
It might be more flexible/useful to have a "default" clause, so if the input value falls out of the expected range then it branches to some offset, where the program can decide if to fault or not. eg
switch1 label0 label1 ... labelN default labelD
I suppose we would get that automatically if the execution simply continued on to the next PC (rather than panic) if there were no matches. Those who prefer to panic can insert an err opcode after the switch. Those who prefer a default case can either b, or just insert their code right there.
An extra implementation note. Having thought about the idea of using varint encoding of the labels a bit more, I'll say:
a) Maybe it's too much trouble, since offsets will have to adjust for the encoding sizes, which can't be determined until "phase 2".
b) If we do it, we should look to the optimization of int and byte references for inspiration. They are optimized to use the shortest encoding in optimizeConstants and when that is done, various references will have to be adjusted. Encoding small jumps would need to do something very similar.
Does anyone actually care about the case of more than 256 labels? I'm considering that maybe the label count should just be encoded as an unsigned byte rather than varuint.
Fine with me. I can add a limit in the compiler easily.
Also, any suggestion for a good name for the two different opcodes? (though this issue is only about one, it'd be nice to think about the future for naming). To recap: this opcode requires the switched value to be a small integer, and switches among a few labels provided as immediate arguments. A followup one might take values and labels as immediates, and would switch to the label associated with the right value. I think think of this one as switching among "dense" values, and the other as among "sparse", but I'm not in love with incorporating those words into the opcode names.
Maybe switch for the implicit version and case for the explicit