msgpackffi is x3 times slower than Lua C
Here is the bench I used:
local clock = require('clock').monotonic64
local msgpackffi = require('msgpackffi')
local msgpack = require('msgpack')
local mp_encode_ffi = msgpackffi.encode
local mp_encode = msgpack.encode
local run_count = 100
local iter_count = 10000
local tuple_data = {1, 2, 3}
local jit_is_on = true
if not jit_is_on then
jit.off()
end
local function bench_mp_encode_ffi()
for i = 1, iter_count do
mp_encode_ffi(tuple_data)
end
end
local function bench_mp_encode()
for i = 1, iter_count do
mp_encode(tuple_data)
end
end
local times = {}
for i = 1, run_count do
collectgarbage('collect')
local t1 = clock()
------------------------- PUT THE BENCH CALLS HERE -------------------------
bench_mp_encode_ffi()
----------------------------------------------------------------------------
local t2 = clock()
local duration = t2 - t1
table.insert(times, duration)
end
table.sort(times)
print(string.format('Median value per iteration: %s ns',
tonumber(times[#times / 2]) / iter_count))
The ffi version works for ~950ns, the Lua C version works for ~280ns. It seems msgpackffi does not serve its purpose - be faster than Lua C because of not leaving Lua VM and better jitting. Either JIT does not work, or I have no idea what is the reason.
Both implementations use the global Lua ibuf (IBUF_SHARED in Lua, tarantool_lua_ibuf in C), and free it after each encode, but I tried to keep the memory reused, and it didn't change much. The reason is in something different.
Just guess: trace stitching changes the ratio.
One of the ideas discussed recently: to deprecate msgpackffi in order not to try to keep 2 modules with the same API. Moreover, if the fast version looks not that fast.
NB: Still reproduced on 2.11.0-entrypoint-169-gd1d3d93ae (right after #5885 and #4630 fix). We don't encode any cdata structures here, but I re-run the benchmark just in case.