Are the thread API's more inefficient than they need to be?
The current mlua thread API seems to push the values into the main Lua stack first before using xmove to move the pushed values from main stack to the thread state.
Would it not be more efficient to modify IntoLua trait to accept another parameter of lua_State to directly copy the values to thread state directly which would vastly reduce the number of stack operations performed by mlua?
which would vastly reduce the number of stack operations performed by mlua?
It's a single xmove call which translates to
from->top.p -= n;
for (i = 0; i < n; i++) {
setobjs2s(to, to->top.p, from->top.p + i);
to->top.p++;
}
where setobjs2s copy bits from one TValue to another.
Unless you're passing a significant number of arguments to the threads unlikely you will notice any difference. It's relatively lightweight operation and takes only fraction of time compared to running the entire thread to completion.
Would it not be more efficient to modify IntoLua trait to accept another parameter of lua_State to directly copy the values to thread state directly
Generally I agree that it would be more efficient. But it's a relatively large code change for a small use case with few edge cases that should be taken care of. Would be good to get confidence of performance boost before making any changes.
@khvzak I fixed it slightly with https://github.com/mlua-rs/mlua/pull/593 BTW, it passes all tests at least