tarantool msgpackffi is x3 times slower than Lua C

Here is the bench I used:

local clock = require('clock').monotonic64
local msgpackffi = require('msgpackffi')
local msgpack = require('msgpack')
local mp_encode_ffi = msgpackffi.encode
local mp_encode = msgpack.encode

local run_count = 100
local iter_count = 10000
local tuple_data = {1, 2, 3}
local jit_is_on = true

if not jit_is_on then
    jit.off()
end

local function bench_mp_encode_ffi()
    for i = 1, iter_count do
        mp_encode_ffi(tuple_data)
    end
end

local function bench_mp_encode()
    for i = 1, iter_count do
        mp_encode(tuple_data)
    end
end

local times = {}

for i = 1, run_count do
    collectgarbage('collect')
    local t1 = clock()

    ------------------------- PUT THE BENCH CALLS HERE -------------------------
    bench_mp_encode_ffi()
    ----------------------------------------------------------------------------

    local t2 = clock()
    local duration = t2 - t1
    table.insert(times, duration)
end

table.sort(times)
print(string.format('Median value per iteration: %s ns',
                    tonumber(times[#times / 2]) / iter_count))

The ffi version works for ~950ns, the Lua C version works for ~280ns. It seems msgpackffi does not serve its purpose - be faster than Lua C because of not leaving Lua VM and better jitting. Either JIT does not work, or I have no idea what is the reason.

Both implementations use the global Lua ibuf (IBUF_SHARED in Lua, tarantool_lua_ibuf in C), and free it after each encode, but I tried to keep the memory reused, and it didn't change much. The reason is in something different.

Mar 15 '21 21:03 Gerold103

Just guess: trace stitching changes the ratio.

Mar 15 '21 22:03 Totktonada

One of the ideas discussed recently: to deprecate msgpackffi in order not to try to keep 2 modules with the same API. Moreover, if the fast version looks not that fast.

Jul 22 '21 08:07 kyukhin

NB: Still reproduced on 2.11.0-entrypoint-169-gd1d3d93ae (right after #5885 and #4630 fix). We don't encode any cdata structures here, but I re-run the benchmark just in case.

Jul 07 '22 12:07 Totktonada