vm: replace jump table with switch
A couple years ago the go compiler finally implemented jumptables for switch: https://go-review.googlesource.com/c/go/+/357330
In order to avoid function call overhead of evalInstruction, a mainLoop/mainLoopWithContext are combined for inlining
Benchmarks done with: AMD Ryzen 7 7840U w/ Radeon 780M Graphics
before:
BenchmarkCallFrameStackPushPopAutoGrow-16 549974 2160 ns/op
BenchmarkCallFrameStackPushPopFixed-16 2073770 575.6 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16 24444289 50.84 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16 56415529 21.05 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16 2096766 568.6 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16 746608 2117 ns/op
BenchmarkCallFrameStackUnwindFixed-16 2377479 498.8 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16 2349202 498.5 ns/op
BenchmarkRegistryPushPopAutoGrow-16 10000 110720 ns/op
BenchmarkRegistryPushPopFixed-16 10887 110825 ns/op
BenchmarkRegistrySetTop-16 297716 4107 ns/op
PASS
ok github.com/yuin/gopher-lua 24.827s
with only switch:
BenchmarkCallFrameStackPushPopAutoGrow-16 575610 2178 ns/op
BenchmarkCallFrameStackPushPopFixed-16 2014279 602.0 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16 23916462 51.39 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16 58842040 21.84 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16 2033984 571.5 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16 741211 1685 ns/op
BenchmarkCallFrameStackUnwindFixed-16 2379696 503.4 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16 2392480 500.6 ns/op
BenchmarkRegistryPushPopAutoGrow-16 9115 112333 ns/op
BenchmarkRegistryPushPopFixed-16 10574 111390 ns/op
BenchmarkRegistrySetTop-16 294739 4028 ns/op
PASS
ok github.com/yuin/gopher-lua 24.605s
with combined main loops:
BenchmarkCallFrameStackPushPopAutoGrow-16 555296 2249 ns/op
BenchmarkCallFrameStackPushPopFixed-16 1945405 588.3 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16 23535645 51.98 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16 60960530 21.30 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16 2071460 590.7 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16 705776 1691 ns/op
BenchmarkCallFrameStackUnwindFixed-16 2370794 508.9 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16 2382297 513.4 ns/op
BenchmarkRegistryPushPopAutoGrow-16 10000 110263 ns/op
BenchmarkRegistryPushPopFixed-16 10845 110933 ns/op
BenchmarkRegistrySetTop-16 295574 4091 ns/op
PASS
ok github.com/yuin/gopher-lua 24.584s
with evalInstruction inlined & lifting reg assignment out of loop:
BenchmarkCallFrameStackPushPopAutoGrow-16 573594 2150 ns/op
BenchmarkCallFrameStackPushPopFixed-16 1806942 674.8 ns/op
BenchmarkCallFrameStackPushPopShallowAutoGrow-16 24414471 51.14 ns/op
BenchmarkCallFrameStackPushPopShallowFixed-16 59304620 18.99 ns/op
BenchmarkCallFrameStackPushPopFixedNoInterface-16 2063114 596.8 ns/op
BenchmarkCallFrameStackUnwindAutoGrow-16 650049 1675 ns/op
BenchmarkCallFrameStackUnwindFixed-16 2300353 511.5 ns/op
BenchmarkCallFrameStackUnwindFixedNoInterface-16 2361276 513.6 ns/op
BenchmarkRegistryPushPopAutoGrow-16 9720 114780 ns/op
BenchmarkRegistryPushPopFixed-16 9316 114070 ns/op
BenchmarkRegistrySetTop-16 289208 4033 ns/op
PASS
ok github.com/yuin/gopher-lua 23.390s
bit off topic, but the Benchmarks in wiki are quite outdated. Despite golang improvements since 1.7 (seen by now only being ~4x slower than upstream lua), python3's performance has greatly improved since then. I reran for lua/luajit/glua/python3
> time lua _glua-tests/fib35.lua
real 0m0.358s
user 0m0.354s
sys 0m0.004s
> time luajit _glua-tests/fib35.lua
real 0m0.052s
user 0m0.040s
sys 0m0.004s
> time python _glua-tests/fib35.py
real 0m0.813s
user 0m0.804s
sys 0m0.004s
> time ./glua _glua-tests/fib35.lua # this PR
real 0m1.698s
user 0m1.692s
sys 0m0.007s
> time ./glua _glua-tests/fib35.lua # master
real 0m1.732s
user 0m1.706s
sys 0m0.004s
Granted it's a rather synthetic benchmark