minetest icon indicating copy to clipboard operation
minetest copied to clipboard

Implement core.get_node() in Lua so that it can be optimized by LuaJIT

Open paradust7 opened this issue 3 years ago • 22 comments

Replaces core.get_node() with a Lua implementation, which then uses two private back-end methods, get_node_raw and get_node_name, to retrieve data from the script api. Dealing with basic types is much more efficient than constructing tables and strings from C++ using the lua api. Node names are cached in Lua to reduce the passing of strings.

See issue https://github.com/minetest/minetest/issues/12437 for more details

On my particular machine with MCL2, this reduces the time needed to run finishGen() from 78 ms to 68 ms on average. This can be measured using this patch:

https://github.com/minetest/minetest/commit/56db6130001961f8bc57fc4a3f476e4842266c7f.patch

paradust7 avatar Jun 13 '22 08:06 paradust7

Is get_node_name different from get_name_from_content_id ?

Maybe you could change get_name_from_content_id to cache its results, then use it in get_node. This would have wider reaching performance benefits.

TurkeyMcMac avatar Jun 13 '22 12:06 TurkeyMcMac

@TurkeyMcMac Yes it appears to do the same thing. It would benefit from using the same cache. I will work that in.

There are 321 API methods split across 17 files. 68 calls to lua_createtable. 268 calls to lua_pushstring. Could take a while to discover all the optimization opportunities here.

paradust7 avatar Jun 14 '22 02:06 paradust7

What you probably want to do here is cache the content id -> node name mapping ahead of time, make get_name_from_content_id use that instead. Then apply this optimization to get_node_or_nil and redefine get_node as return get_node_or_nil(pos) or {name="ignore",param1=0,param2=0}.

sfan5 avatar Jun 14 '22 08:06 sfan5

One thing to consider is that get_name_from_content_id specifically can be called when mods are loading and nodes may yet be registered or unregistered. It should only cache nodes after all of them have finished loading.

TurkeyMcMac avatar Jun 14 '22 11:06 TurkeyMcMac

Potential implementation:

local old_get_content_id = core.get_content_id
local old_get_name_from_content_id = core.get_name_from_content_id

local name2content = setmetatable({}, {
	__index = function(self, name)
		return old_get_content_id(name)
	end,
})

local content2name = setmetatable({}, {
	__index = function(self, id)
		return old_get_name_from_content_id(id)
	end,
})

core.after(0, function()
	for name in pairs(core.registered_nodes) do
		local id = old_get_content_id(name)
		name2content[name] = id
		content2name[id] = name
	end
	for name in pairs(core.registered_aliases) do
		if core.registered_nodes[name] then
			name2content[name] = old_get_content_id(name)
		end
	end
end)

function core.get_content_id(name)
	return name2content[name]
end

function core.get_name_from_content_id(id)
	return content2name[id]
end

TurkeyMcMac avatar Jun 14 '22 13:06 TurkeyMcMac

@TurkeyMcMac get_content_id and get_name_from_content_id are available in async jobs. Does core.after work in async jobs? From reading the code, it doesn't seem to. But maybe I'm missing something.

paradust7 avatar Jun 14 '22 23:06 paradust7

@paradust7 Thanks for pointing that out, I added caching for async threads in my PR.

TurkeyMcMac avatar Jun 15 '22 01:06 TurkeyMcMac

I ended up with a different approach, adding content_id to the itemdef for nodes (for "name2content"), and a new table core.registered_content_ids for the reverse lookup. These are then reconstructed in the async threads using the content_id inside the itemdef.

paradust7 avatar Jun 15 '22 02:06 paradust7

New commit moves get_node, get_node_or_nil, get_name_from_content_id, and get_content_id into Lua, deleting their corresponding C++ methods. Adds C++ method get_node_raw.

paradust7 avatar Jun 15 '22 02:06 paradust7

ugh, i think the extra look ups via core are killing the performance. the new change is almost as slow as the original

paradust7 avatar Jun 15 '22 02:06 paradust7

Besides putting stuff from core into local variables, you could restore the direct mappings between names and IDs. get_content_id would become a single hash lookup, while get_name_from_content_id would become a single array lookup.

TurkeyMcMac avatar Jun 15 '22 03:06 TurkeyMcMac

I recommend an approach similar to mine in #12444. This would entail keeping the C++ implementations of get_content_id and get_name_from_content_id, but they would not be called in the fast path.

TurkeyMcMac avatar Jun 15 '22 03:06 TurkeyMcMac

It's not an issue if get_content_id and get_name_from_content_id use the slow path in the async environment btw, the fast path is mostly useful to speed up get_node which doesn't exist there.

Also about get_node performance maybe some microbenchmarking is in order so we have hard numbers.

sfan5 avatar Jun 15 '22 06:06 sfan5

Should VoxelManip:get_node_at be moved to Lua as well, caching will become significant on async threads.

TurkeyMcMac avatar Jun 15 '22 12:06 TurkeyMcMac

For profiling the Lua code you could use my mod jitprofiler.

TurkeyMcMac avatar Jun 15 '22 13:06 TurkeyMcMac

Should VoxelManip:get_node_at be moved to Lua as well,

There's no reason for mods to use that in bulk so not worth optimizing IMO.

sfan5 avatar Jun 15 '22 13:06 sfan5

Should VoxelManip:get_node_at be moved to Lua as well,

There's no reason for mods to use that in bulk so not worth optimizing IMO.

Mesecons uses it a lot, although it doesn't use async jobs.

TurkeyMcMac avatar Jun 15 '22 13:06 TurkeyMcMac

I think maybe something went wrong with my machine, power management rules changed possibly after upgrade. The measurement error is now too large to be useful.

In another PR, I'll add some micro-benchmarks (as part of --run-benchmarks) for the hot api methods. That'll make it easier to measure any improvement and prevent regressions.

paradust7 avatar Jun 16 '22 03:06 paradust7

By the way, I have tried implementing get_node_at and set_node_at in Lua. In combination with sped-up versions of get_content_id and get_name_from_content_id, this change shaves off about a fifth of the time taken to turn on and off a sizeable block of Mesecons wires.

TurkeyMcMac avatar Jun 17 '22 20:06 TurkeyMcMac

Well regarding that, if it has vmanips why doesn't it retrieve the data array and operate directly on them? That will be even faster (though use more RAM).

sfan5 avatar Jun 17 '22 21:06 sfan5

I changed Mesecons a little while back to use the approach it now does. Previously it did what you suggested. I found the new approach to be as fast or faster in almost all cases. Mesecons doesn't benefit so much from get_data etc. because 1) it usually operates on a minority of nodes in a mapblock and 2) it has to reconstruct the node table of each node anyway. get_node_at creates the node table already. Also, it now caches each node table it gets until the end of the transaction.

After some more work is done on this PR, it might be worth it to compare the two different Mesecons approaches again. It could be that the old one becomes faster.

TurkeyMcMac avatar Jun 17 '22 21:06 TurkeyMcMac

Are we still doing this? :)

lhofhansl avatar Jul 13 '22 10:07 lhofhansl

Yes, we should.

Desour avatar Jan 19 '24 15:01 Desour