The reference interpreter became up to 20 times slower after a87fffc
While experimenting with the reference interpreter on binaries produced by Kotlin, I noticed a huge performance degradation. It seems it is due to some changes introduced in a87fffc.
To reproduce it, please run a wasm file index.wasm.zip in reference interpreter with the following command:
'wasm' index.wasm -t -e '(module instance) (invoke "_initialize") (invoke "runBoxTest")'
Here are some runs with different versions:
$ time '_build/a8/wasm.exe' index.wasm -t -e '(module instance) (invoke "_initialize") (invoke "runBoxTest")'
-- Running ("(input \"index.wasm\")")...
-- Parsing...
-- Running...
-- Loading (index.wasm)...
-- Decoding...
-- Running...
-- Decoding...
-- Checking...
-- Running ("(module instance) (invoke \"_initialize\") (invoke \"runBoxTest\")")...
-- Parsing...
-- Running...
-- Initializing...
-- Invoking function "_initialize"...
-- Invoking function "runBoxTest"...
1 : [i32]
'a8/wasm.exe' -t -e 12.65s user 1.27s system 97% cpu 14.212 total
$ time '_build/d0/wasm.exe' index.wasm -t -e '(module instance) (invoke "_initialize") (invoke "runBoxTest")'
-- Running ("(input \"index.wasm\")")...
-- Parsing...
-- Running...
-- Loading (index.wasm)...
-- Decoding...
-- Running...
-- Decoding...
-- Checking...
-- Running ("(module instance) (invoke \"_initialize\") (invoke \"runBoxTest\")")...
-- Parsing...
-- Running...
-- Initializing...
-- Invoking function "_initialize"...
-- Invoking function "runBoxTest"...
1 : [i32]
'_build/d0/wasm.exe' -t -e 0.65s user 0.02s system 73% cpu 0.919 total
$ time '_build/d7/wasm.exe' index.wasm -t -e '(module instance) (invoke "_initialize") (invoke "runBoxTest")'
-- Running ("(input \"index.wasm\")")...
-- Parsing...
-- Running...
-- Loading (index.wasm)...
-- Decoding...
-- Running...
-- Decoding...
-- Checking...
-- Running ("(module instance) (invoke \"_initialize\") (invoke \"runBoxTest\")")...
-- Parsing...
-- Running...
-- Initializing...
-- Invoking function "_initialize"...
-- Invoking function "runBoxTest"...
1 : [i32]
'_build/d7/wasm.exe' -t -e 0.65s user 0.02s system 78% cpu 0.860 total
Thanks for the report. I was able to narrow the regression down to a bugfix in the substitution function. Previously, it cut off at deftypes, assuming they are closed, but that wasn't a sound assumption. Now it naively applies transitively. That didn't make a notable difference on the test suite, but shows for modules with long chains of rectypes.
I'll try to find some optimisation that doesn't interfere too much with the executable-spec style of the interpreter, but I don't know if that's gonna be easy. Simple memoisation didn't help.