solidity
solidity copied to clipboard
Use jump table for some switch statements
@philbawa hi, how do I transfer BNB
This is looking really good already! We have to be careful to find a good balance between this being generic enough and not too complicated, but I would suggest the following items (that could maybe go into future pull requests):
- implement this in optimized evm transform
- use gas measurements and the "runs" parameter to decide whether or not to do the optimization
- allow cases that are not exactly "enum-like" - for example it might not be such a big deal if some of the values are missing - we just use the same tag as the default case
- think about how we could use this in the function dispatch - see https://github.com/ethereum/solidity/issues/12650
Item 3 is now completed. Any range of cases spanning no more than 16 values is now supported by subtracting the minimum case if necessary (underflows desired), and empty cases are allowed to be stored in the pushed value, which jump to the default case.
I'm not sure if that was already your reason for the relative offsets, but if we use relative offsets and can ensure them to be at most 256, we can actually fit 32 cases in a single push constant instead of just 16.
Also, @ekpyron suggested that as an alternative, we could also use a fully fixed-layout jump table instead, which would allow us to have no restriction on the size. The downside is that it needs another jump for each case and 4 additional bytes per case
let x :=<switch value>
if ge(x, <max value>) { /* something else */ }
jump(add(<start>, mul(x, <byte size of case>)))
// for each case:
jumpdest
pushtag <actual tag>
jump
The primary reason I included relative offsets was to eliminate a conditional jump for gas savings, but you're right, we could fit up to 32 cases in a single push, especially when the cases are sparse. The reason I haven't done it that way yet is for architecture reasons: the jump table code itself is currently built in EVMCodeTransform, and I wanted to make the fewest assumptions about the addresses where the cases begin. For instance, if we were to handle 32 cases and a case's address is >= 256 from the default case, the compilation would fail when deciding the jump addresses because the relative offset would not fit in a single byte.
But the compilation could currently fail anyway with backward jumps because the offsets are relative. Those aren't possible with the unoptimized version since all non-default cases come after the default case, but maybe with the optimizer that will become an issue.
Is there a way I can shift the assembly code generation of the jump table to be done after the case addresses are known? That way we could then decide if we could fit 32 cases and also whether we need to handle backward jumps. The code size would vary, so I'm not sure how I would handle that properly, maybe by taking up the maximum possible size and using some instruction for no-ops if the actual size is less.
fully fixed-layout jump table
Ha! That was my first idea. It is still limited by the range of the cases (we can't fit cases spanning a range of 20,000) and uses more gas (59 rather than 35), but if you have a larger range of cases it could be useful.
For debugging the optimized evm code generation path, it may be good to know that you can visualize the control flow graphs and stack layouts by running our tests:
You need to have something installed that can consume graphviz dot files on stdin - I usually use xdot
for that.
Then you can run (assuming a linux system, otherwise however you can set environment variables):
export ISOLTEST_DISPLAY_GRAPHS_ON_SUCCESS_COMMAND="xdot -"
export ISOLTEST_DISPLAY_GRAPHS_ON_FAILURE_COMMAND="xdot -"
build/test/tools/isoltest -t yulControlFlowGraph*
to get visualization of all control flow graph tests and
export ISOLTEST_DISPLAY_GRAPHS_SUCCESS="xdot -"
export ISOLTEST_DISPLAY_GRAPHS_FAILURE="xdot -"
build/test/tools/isoltest -t yulStackLayout*
to get visualization of all stack layout tests. (It should actually be the same environment variables in both cases, we should probably align them :-))
The problem there, of course, is that those test runs won't understand the new Switch
exit, so you will only be able to use this on develop
, until you extend test/libyul/ControlFlowGraphTest.cpp
and test/libyul/StackLayoutGeneratorTest.cpp
to handle the new switch exit... but it may still be helpful to look at the existing graphs and stack layouts. And adjusting those tests may help in properly building the graphs and stack layouts by allowing to visualize them.
The switch block type is now in ControlFlowGraphTest and StackLayoutGeneratorTest. The new graphs seem correct to me, but I haven't yet committed updates to the test cases.
I've, out of curiosity, just built the latest PR version with additionally my last three comments addressed - and that version in fact passes all of our semantics tests, so that's great!
However, it has some adverse effects on the gas costs in quite a few cases, so it would be nice to look into some cases in more detail, compare the generated assembly and see if we can figure out where the increased cost is coming from and if we can do something about it. My guess would be that either the order of visitation of the blocks in the stack layout generator changed, resulting in less optimal layouts, or that it has something to do with the precise layout of code in the switch visit of the optimized code transform. But it's hard to tell without an in-depth look. It should be possible to get back similar gas results with only a few minor tweaks, though, I hope.
The external tests actually all passed - here's the relevant part of the benchmarks from using scripts/externalTests/benchmark_diff.py
between the summarized-benchmarks-develop-b80f4baa.json
and summarized-benchmarks.json
artifacts of the c_ext_benchmarks
CI run:
ir-optimize-evm+yul
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | +0.47% ❌ |
0% |
+0.03% ❌ |
brink | +0.88% ❌ |
||
chainlink | +0.66% ❌ |
+0.64% ❌ |
+0.02% ❌ |
colony | 0% |
||
elementfi | +1.01% ❌ |
||
ens | 0% |
+0.71% ❌ |
+0.01% ❌ |
euler | !V |
!V |
!V |
gnosis | +0.62% ❌ |
+0.61% ❌ |
+0.02% ❌ |
gp2 | +0.87% ❌ |
+0.78% ❌ |
+0.03% ❌ |
perpetual-pools | +0.95% ❌ |
+0.93% ❌ |
+0.13% ❌ |
pool-together | +0.77% ❌ |
+0.75% ❌ |
+0% |
prb-math | +1.32% ❌ |
+1.32% ❌ |
0% |
trident | +0.7% ❌ |
+0.73% ❌ |
+0.53% ❌ |
uniswap | +1.34% ❌ |
+1.24% ❌ |
+1.18% ❌ |
yield_liquidator | +0.68% ❌ |
+0.71% ❌ |
-0% |
zeppelin | !V |
!V |
!V |
So there is some work to be done - we'll probably need to look at some assembly diffs to figure out why the new version with the Switch
exit so far generates slightly worse code, but this strongly suggests that the current version already generates correct code again!
Current issues:
- The switch CFG type as I implemented it has increased bytecode size and same or slightly worse execution cost for regular switches. I think this is because the Yul insertion technique with ghost variables takes advantage of the optimizer in ways that hard-coded assembly cannot. For example, it does not rely on hard-coded dup3 instructions and can have a slightly more flexible stack layout.
- If we do use the switch CFG type, the stack layouts for the case blocks will be different between regular switches and jump table switches, since after the jump, regular switches will have the switch expression left on the stack while a jump table will not. One way to fix this (as discussed with @ekpyron) is to keep the case block entry stack layouts without the switch expression and simply add a pop instruction before each case block when creating the EVM instructions for the regular switch type. The problem is this will produce suboptimal stack layouts when the case blocks could have been produced without the pop instruction.
I am tempted to put the decision to use a jump table into the CFG building step. That way, the Yul insertion with ghost variables method can still be used to make all non-jump tables and produce the best code for them. The CFG switch type would only be used for jump tables then. Of course, this is much less clean than having all switch types use the same CFG block and producing the code in the code transform step.
For reference the gas differences of the latest version:
(The perpetual pools regression is probably merely due to non-determinism)
ir-no-optimize
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | |||
chainlink | |||
colony | 0% |
||
elementfi | |||
ens | |||
euler | |||
gnosis | |||
gp2 | |||
perpetual-pools | |||
pool-together | |||
prb-math | |||
trident | |||
uniswap | |||
yield_liquidator | |||
zeppelin |
ir-optimize-evm+yul
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | 0% |
0% |
0% |
brink | 0% |
||
chainlink | 0% |
+0% |
-0% |
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
+0% |
0% |
euler | -0.01% ✅ |
-0.01% ✅ |
-0.09% ✅ |
gnosis | 0% |
-0% |
-0% |
gp2 | 0% |
+0% |
+0% |
perpetual-pools | 0% |
+0% |
+0.02% ❌ |
pool-together | 0% |
-0% |
-0% |
prb-math | 0% |
+0% |
0% |
trident | 0% |
-0% |
-0% |
uniswap | 0% |
+0% |
+0% |
yield_liquidator | 0% |
-0% |
0% |
zeppelin | 0% |
-0% |
+0% |
ir-optimize-evm-only
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | |||
chainlink | |||
colony | 0% |
||
elementfi | |||
ens | |||
euler | |||
gnosis | |||
gp2 | |||
perpetual-pools | |||
pool-together | |||
prb-math | |||
trident | |||
uniswap | |||
yield_liquidator | |||
zeppelin |
legacy-no-optimize
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | 0% |
||
chainlink | 0% |
||
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
||
euler | -0.02% ✅ |
-0.02% ✅ |
-0.19% ✅ |
gnosis | 0% |
-0% |
+0% |
gp2 | 0% |
||
perpetual-pools | 0% |
-0% |
-0.01% ✅ |
pool-together | 0% |
-0% |
+0% |
prb-math | 0% |
0% |
0% |
trident | 0% |
0% |
+0% |
uniswap | 0% |
+0% |
-0% |
yield_liquidator | 0% |
+0% |
0% |
zeppelin | 0% |
+0% |
-0% |
legacy-optimize-evm+yul
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | 0% |
0% |
0% |
brink | 0% |
||
chainlink | 0% |
+0% |
+0% |
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
0% |
0% |
euler | -0.03% ✅ |
-0.03% ✅ |
-0.19% ✅ |
gnosis | 0% |
+0% |
-0% |
gp2 | 0% |
+0% |
+0% |
perpetual-pools | 0% |
+0% |
-0.02% ✅ |
pool-together | 0% |
+0% |
+0% |
prb-math | 0% |
0% |
0% |
trident | 0% |
-0% |
-0% |
uniswap | 0% |
+0% |
+0% |
yield_liquidator | 0% |
-0% |
-0% |
zeppelin | 0% |
-0% |
+0% |
legacy-optimize-evm-only
project | bytecode_size | deployment_gas | method_gas |
---|---|---|---|
bleeps | |||
brink | 0% |
||
chainlink | 0% |
+0% |
-0% |
colony | 0% |
||
elementfi | 0% |
||
ens | 0% |
+0% |
0% |
euler | -0.02% ✅ |
-0.02% ✅ |
-0.13% ✅ |
gnosis | 0% |
-0% |
-0% |
gp2 | 0% |
0% |
+0% |
perpetual-pools | 0% |
0% |
+0% |
pool-together | 0% |
0% |
+0% |
prb-math | 0% |
0% |
0% |
trident | 0% |
0% |
-0% |
uniswap | 0% |
0% |
+0% |
yield_liquidator | 0% |
+0% |
0% |
zeppelin | 0% |
-0% |
+0% |
!V
= version mismatch
!B
= no value in the "before" version
!A
= no value in the "after" version
!T
= one or both values were not numeric and could not be compared
-0
= very small negative value rounded to zero
+0
= very small positive value rounded to zero
So it already kicks in without us e.g. changing the external dispatch to something that allows jump tables.
I hope I'll have time for a closer look at the implementation soon.