vm: replace jump table with switch

Open serprex opened this issue 1 year ago • 1 comments

A couple years ago the go compiler finally implemented jumptables for switch: https://go-review.googlesource.com/c/go/+/357330

In order to avoid function call overhead of evalInstruction, a mainLoop/mainLoopWithContext are combined for inlining

Benchmarks done with: AMD Ryzen 7 7840U w/ Radeon 780M Graphics

before:

BenchmarkCallFrameStackPushPopAutoGrow-16                 549974              2160 ns/op
BenchmarkCallFrameStackPushPopFixed-16                   2073770               575.6 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16        24444289                50.84 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16           56415529                21.05 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16        2096766               568.6 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16                  746608              2117 ns/op
BenchmarkCallFrameStackUnwindFixed-16                    2377479               498.8 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16         2349202               498.5 ns/op
BenchmarkRegistryPushPopAutoGrow-16                        10000            110720 ns/op
BenchmarkRegistryPushPopFixed-16                           10887            110825 ns/op
BenchmarkRegistrySetTop-16                                297716              4107 ns/op
PASS
ok      github.com/yuin/gopher-lua      24.827s

with only switch:

BenchmarkCallFrameStackPushPopAutoGrow-16                 575610              2178 ns/op
BenchmarkCallFrameStackPushPopFixed-16                   2014279               602.0 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16        23916462                51.39 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16           58842040                21.84 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16        2033984               571.5 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16                  741211              1685 ns/op
BenchmarkCallFrameStackUnwindFixed-16                    2379696               503.4 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16         2392480               500.6 ns/op
BenchmarkRegistryPushPopAutoGrow-16                         9115            112333 ns/op
BenchmarkRegistryPushPopFixed-16                           10574            111390 ns/op
BenchmarkRegistrySetTop-16                                294739              4028 ns/op
PASS
ok      github.com/yuin/gopher-lua      24.605s

with combined main loops:

BenchmarkCallFrameStackPushPopAutoGrow-16                 555296              2249 ns/op
BenchmarkCallFrameStackPushPopFixed-16                   1945405               588.3 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16        23535645                51.98 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16           60960530                21.30 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16        2071460               590.7 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16                  705776              1691 ns/op
BenchmarkCallFrameStackUnwindFixed-16                    2370794               508.9 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16         2382297               513.4 ns/op
BenchmarkRegistryPushPopAutoGrow-16                        10000            110263 ns/op
BenchmarkRegistryPushPopFixed-16                           10845            110933 ns/op
BenchmarkRegistrySetTop-16                                295574              4091 ns/op
PASS
ok      github.com/yuin/gopher-lua      24.584s

with evalInstruction inlined & lifting reg assignment out of loop:

BenchmarkCallFrameStackPushPopAutoGrow-16                 573594              2150 ns/op
BenchmarkCallFrameStackPushPopFixed-16                   1806942               674.8 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16        24414471                51.14 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16           59304620                18.99 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16        2063114               596.8 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16                  650049              1675 ns/op
BenchmarkCallFrameStackUnwindFixed-16                    2300353               511.5 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16         2361276               513.6 ns/op
BenchmarkRegistryPushPopAutoGrow-16                         9720            114780 ns/op
BenchmarkRegistryPushPopFixed-16                            9316            114070 ns/op
BenchmarkRegistrySetTop-16                                289208              4033 ns/op
PASS
ok      github.com/yuin/gopher-lua      23.390s

Mar 12 '24 21:03 serprex

bit off topic, but the Benchmarks in wiki are quite outdated. Despite golang improvements since 1.7 (seen by now only being ~4x slower than upstream lua), python3's performance has greatly improved since then. I reran for lua/luajit/glua/python3

> time lua _glua-tests/fib35.lua
real    0m0.358s
user    0m0.354s
sys     0m0.004s

> time luajit _glua-tests/fib35.lua
real    0m0.052s
user    0m0.040s
sys     0m0.004s

> time python _glua-tests/fib35.py
real    0m0.813s
user    0m0.804s
sys     0m0.004s

> time ./glua _glua-tests/fib35.lua # this PR
real    0m1.698s
user    0m1.692s
sys     0m0.007s

> time ./glua _glua-tests/fib35.lua # master
real	0m1.732s
user	0m1.706s
sys	0m0.004s

Granted it's a rather synthetic benchmark

Mar 12 '24 22:03 serprex