wasmtime-py
wasmtime-py copied to clipboard
performance: calling python from wasm is too slow
as mentioned in parent performance ticket https://github.com/bytecodealliance/wasmtime-py/issues/96#issuecomment-1497336349
we have a bottleneck when python is called from inside wasm
here is a minimum reproducible example
unlike #96 which is to be solved by #139 the loop here lives inside wasm and the functions are in python, C (python's math) and wasm this issue is not solved by #139
experiment setup
(module
(func $log_i (import "env" "log_i") (param i32))
(func $math_gcd (import "env" "math_gcd") (param i32 i32) (result i32))
(func $python_gcd (import "env" "python_gcd") (param i32 i32) (result i32))
(func $gcd (export "gcd") (param i32 i32) (result i32)
;; ...
)
(func $wasm_gcd_loop (export "wasm_gcd_loop") (param $n i32) (param $a i32) (param $b i32) (result i32)
;; call wasm_gcd(a,b) n times
)
(func $math_gcd_loop (export "math_gcd_loop") (param $n i32) (param $a i32) (param $b i32) (result i32)
;; call math_gcd(a,b) n times
)
(func $python_gcd_loop (export "python_gcd_loop") (param $n i32) (param $a i32) (param $b i32) (result i32)
;; call python_gcd(a,b) n times
)
)
I've logged i and logged gcd to make sure loop is working fine using log_i()
I've removed the logging
I've imported C gcd from python's math package
I've imported python gcd
and did a loop inside wasm calling the 3 versions
print("*** wasm loop benchmark: ")
for name in "wasm_gcd_loop", "math_gcd_loop", "python_gcd_loop":
gcdf = by_name[name]
start_time = time.perf_counter()
g = gcdf(N, a, b)
total_time = time.perf_counter() - start_time
print(total_time, "\t\t", name)
results
9.899199358187616e-05 wasm_gcd_loop
0.01052837198949419 math_gcd_loop
0.012672090000705793 python_gcd_loop
conclusion
when wasm loop call's wasm function it was too fast, it was 100x faster than calling C's gcd since C's gcd is faster than wasm gcd as shown in other benchmark, then this bottleneck is wrapping python part.
profiling
here is how we profile calling C from wasm
pr=Profile()
pr.enable()
print(math_gcd_loop(N, a, b))
pr.disable()
pr.print_stats()
pr.print_stats('cumulative')
pr.print_stats('calls')
using the production 7.0.0, the obvious bootleneck is in:
-
_func.py:172(trampoline)
-
_value.py:129(_convert)
Ordered by: call count
ncalls tottime percall cumtime percall filename:lineno(function)
7024 0.001 0.000 0.001 0.000 {built-in method builtins.isinstance}
3019 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
2006 0.001 0.000 0.001 0.000 _bindings.py:178(wasm_valtype_kind)
2005 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
2001 0.001 0.000 0.001 0.000 _value.py:167(_value)
1009 0.000 0.000 0.000 0.000 {built-in method _ctypes.POINTER}
1008 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x7fc0d01430a0}
1007 0.001 0.000 0.001 0.000 _types.py:45(_from_ptr)
1007 0.001 0.000 0.002 0.000 _types.py:86(__del__)
1004 0.000 0.000 0.000 0.000 _value.py:109(__init__)
1004 0.000 0.000 0.000 0.000 _value.py:117(__del__)
1004 0.000 0.000 0.000 0.000 _value.py:161(_unwrap_raw)
1003 0.001 0.000 0.001 0.000 _bindings.py:172(wasm_valtype_new)
1003 0.002 0.000 0.002 0.000 _value.py:39(i32)
1003 0.002 0.000 0.012 0.000 _value.py:129(_convert)
1003 0.001 0.000 0.003 0.000 _types.py:12(i32)
1003 0.001 0.000 0.002 0.000 _types.py:54(__eq__)
1003 0.001 0.000 0.001 0.000 _bindings.py:120(wasm_valtype_delete)
1000 0.000 0.000 0.000 0.000 _func.py:121(__init__)
1000 0.005 0.000 0.020 0.000 _func.py:172(trampoline)
1000 0.000 0.000 0.001 0.000 _func.py:245(get)
1000 0.000 0.000 0.001 0.000 _value.py:156(_into_raw)
1000 0.000 0.000 0.000 0.000 typing.py:1737(cast)
1000 0.000 0.000 0.000 0.000 {built-in method math.gcd}
1000 0.000 0.000 0.000 0.000 {built-in method builtins.delattr}
by time
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.022 0.022 _func.py:59(__call__)
1 0.002 0.002 0.022 0.022 _bindings.py:2491(wasmtime_func_call)
1000 0.005 0.000 0.020 0.000 _func.py:172(trampoline)
1003 0.002 0.000 0.012 0.000 _value.py:129(_convert)
1003 0.001 0.000 0.003 0.000 _types.py:12(i32)
1003 0.001 0.000 0.002 0.000 _types.py:54(__eq__)
1003 0.002 0.000 0.002 0.000 _value.py:39(i32)
1007 0.001 0.000 0.002 0.000 _types.py:86(__del__)
2006 0.001 0.000 0.001 0.000 _bindings.py:178(wasm_valtype_kind)
1007 0.001 0.000 0.001 0.000 _types.py:45(_from_ptr)
2001 0.001 0.000 0.001 0.000 _value.py:167(_value)
1000 0.000 0.000 0.001 0.000 _value.py:156(_into_raw)
1003 0.001 0.000 0.001 0.000 _bindings.py:120(wasm_valtype_delete)
1003 0.001 0.000 0.001 0.000 _bindings.py:172(wasm_valtype_new)
7024 0.001 0.000 0.001 0.000 {built-in method builtins.isinstance}
1000 0.000 0.000 0.001 0.000 _func.py:245(get)