minetest
minetest copied to clipboard
Implement core.get_node() in Lua so that it can be optimized by LuaJIT
Replaces core.get_node() with a Lua implementation, which then uses two private back-end methods, get_node_raw and get_node_name, to retrieve data from the script api. Dealing with basic types is much more efficient than constructing tables and strings from C++ using the lua api. Node names are cached in Lua to reduce the passing of strings.
See issue https://github.com/minetest/minetest/issues/12437 for more details
On my particular machine with MCL2, this reduces the time needed to run finishGen() from 78 ms to 68 ms on average. This can be measured using this patch:
https://github.com/minetest/minetest/commit/56db6130001961f8bc57fc4a3f476e4842266c7f.patch
Is get_node_name different from get_name_from_content_id ?
Maybe you could change get_name_from_content_id to cache its results, then use it in get_node. This would have wider reaching performance benefits.
@TurkeyMcMac Yes it appears to do the same thing. It would benefit from using the same cache. I will work that in.
There are 321 API methods split across 17 files. 68 calls to lua_createtable. 268 calls to lua_pushstring. Could take a while to discover all the optimization opportunities here.
What you probably want to do here is cache the content id -> node name mapping ahead of time, make get_name_from_content_id use that instead. Then apply this optimization to get_node_or_nil and redefine get_node as return get_node_or_nil(pos) or {name="ignore",param1=0,param2=0}.
One thing to consider is that get_name_from_content_id specifically can be called when mods are loading and nodes may yet be registered or unregistered. It should only cache nodes after all of them have finished loading.
Potential implementation:
local old_get_content_id = core.get_content_id
local old_get_name_from_content_id = core.get_name_from_content_id
local name2content = setmetatable({}, {
__index = function(self, name)
return old_get_content_id(name)
end,
})
local content2name = setmetatable({}, {
__index = function(self, id)
return old_get_name_from_content_id(id)
end,
})
core.after(0, function()
for name in pairs(core.registered_nodes) do
local id = old_get_content_id(name)
name2content[name] = id
content2name[id] = name
end
for name in pairs(core.registered_aliases) do
if core.registered_nodes[name] then
name2content[name] = old_get_content_id(name)
end
end
end)
function core.get_content_id(name)
return name2content[name]
end
function core.get_name_from_content_id(id)
return content2name[id]
end
@TurkeyMcMac get_content_id and get_name_from_content_id are available in async jobs. Does core.after work in async jobs? From reading the code, it doesn't seem to. But maybe I'm missing something.
@paradust7 Thanks for pointing that out, I added caching for async threads in my PR.
I ended up with a different approach, adding content_id to the itemdef for nodes (for "name2content"), and a new table core.registered_content_ids for the reverse lookup. These are then reconstructed in the async threads using the content_id inside the itemdef.
New commit moves get_node, get_node_or_nil, get_name_from_content_id, and get_content_id into Lua, deleting their corresponding C++ methods. Adds C++ method get_node_raw.
ugh, i think the extra look ups via core are killing the performance. the new change is almost as slow as the original
Besides putting stuff from core into local variables, you could restore the direct mappings between names and IDs. get_content_id would become a single hash lookup, while get_name_from_content_id would become a single array lookup.
I recommend an approach similar to mine in #12444. This would entail keeping the C++ implementations of get_content_id and get_name_from_content_id, but they would not be called in the fast path.
It's not an issue if get_content_id and get_name_from_content_id use the slow path in the async environment btw, the fast path is mostly useful to speed up get_node which doesn't exist there.
Also about get_node performance maybe some microbenchmarking is in order so we have hard numbers.
Should VoxelManip:get_node_at be moved to Lua as well, caching will become significant on async threads.
For profiling the Lua code you could use my mod jitprofiler.
Should
VoxelManip:get_node_atbe moved to Lua as well,
There's no reason for mods to use that in bulk so not worth optimizing IMO.
Should
VoxelManip:get_node_atbe moved to Lua as well,There's no reason for mods to use that in bulk so not worth optimizing IMO.
Mesecons uses it a lot, although it doesn't use async jobs.
I think maybe something went wrong with my machine, power management rules changed possibly after upgrade. The measurement error is now too large to be useful.
In another PR, I'll add some micro-benchmarks (as part of --run-benchmarks) for the hot api methods. That'll make it easier to measure any improvement and prevent regressions.
By the way, I have tried implementing get_node_at and set_node_at in Lua. In combination with sped-up versions of get_content_id and get_name_from_content_id, this change shaves off about a fifth of the time taken to turn on and off a sizeable block of Mesecons wires.
Well regarding that, if it has vmanips why doesn't it retrieve the data array and operate directly on them? That will be even faster (though use more RAM).
I changed Mesecons a little while back to use the approach it now does. Previously it did what you suggested. I found the new approach to be as fast or faster in almost all cases. Mesecons doesn't benefit so much from get_data etc. because 1) it usually operates on a minority of nodes in a mapblock and 2) it has to reconstruct the node table of each node anyway. get_node_at creates the node table already. Also, it now caches each node table it gets until the end of the transaction.
After some more work is done on this PR, it might be worth it to compare the two different Mesecons approaches again. It could be that the old one becomes faster.
Are we still doing this? :)
Yes, we should.