premake-core
premake-core copied to clipboard
Performance issues on large projects
What seems to be the problem? Premake currently struggles on large solutions. Our current solution features 230 projects, 5 platforms, with 4 configurations each. Our project file generation takes between 2-3 minutes which isn't extraordinarily bad as you generally don't generate as you iterate, it still takes up 30-40% of our incremental build times on our build machines.
What did you expect to happen? It to perform better
What have you tried so far?
An early win was findProject is very slow. We optimised this by adding an already lowered and a hashed form of the name to the prj at creation time, then comparing against those. This way string comparison only occurs if a hash comparison already succeeded, and it doesn't have to bother lowing the string constantly. In the profile information at the bottom that will show as components/findProjectOptimization.lua. This reduced the findproject time from 19.3 seconds to 6.8 seconds.
How can we reproduce this? Our project is under NDA so we cannot share the specifics. The easiest reproduction would be to download a ton of open source libraries, and add a number of psuedo platforms.
What version of Premake are you using? Version at commit ad89dfcb0d334e0b898214aa452f18eae961ab10
Anything else we should know?
Here is a portion of the pepperfish_profiler results. I've removed everything which takes less than 1 second. As you can see a significant amount of time is spent in config set. All our custom overrides are in OurStuff/components/** https://pastebin.com/jDsGsdw2
Two big culprits are _fetchMerged and <function: 000001F215E59230>(?) in configset, and compile which alone took 60 seconds in self
(ETA: I believe 000001F215E59230 is the local function remove in _fetchMerged)
Further findings. The main reason fetchMerged is so high impact is __index in context.lua, which is in turn highly used by bakeFiles.
----------------------- L:bakeFiles@src/base/oven.lua:298 ---------------------- Sample count: 17472 Time spend total: 75.485s Time spent in children: 75.485s Time spent in self: 0.000s Time spent per sample: 0.00432s/sample Time spent in self per sample: 0.00000s/sample
Child L:for iterator@src/base/project.lua:21 sampled 1 times. Took 0.00s Child L:__index@src/base/context.lua:58 sampled 1056 times. Took 12.32s Child L:foreachi@src/base/table.lua:73 sampled 16320 times. Took 62.45s Child C:sort@=[C]:? sampled 95 times. Took 0.71s
If I'm reading correctly, does this create a fcfg for every file, for every configuration? If right this is likely a root of our problem. (Accounts for 75s at least, the other 60s being compile, which would bring us to a very reasonable 20s left)
-- Start by building a comprehensive list of all the files contained by the
-- project. Some files may only be included in a subset of configurations so
-- I need to look at them all.
for cfg in p.project.eachconfig(prj) do
local function addFile(fname, i)
-- If this is the first time I've seen this file, start a new
-- file configuration for it. Track both by key for quick lookups
-- and indexed for ordered iteration.
local fcfg = files[fname]
if not fcfg then
fcfg = p.fileconfig.new(fname, prj)
fcfg.order = i
files[fname] = fcfg
table.insert(files, fcfg)
end
p.fileconfig.addconfig(fcfg, cfg)
end
table.foreachi(cfg.files, addFile)
-- If this project uses NuGet, we need to add the generated
-- packages.config file to the project. Is there a better place to
-- do this?
if #prj.nuget > 0 and (_ACTION < "vs2017" or p.project.iscpp(prj)) then
addFile("packages.config")
end
end
As we have around 10k+ files, that would be around 200k fcfgs being created. Is there anything we can do to avoid this, even if it's just a local override?
An early win was findProject is very slow. We optimised this by adding an already lowered and a hashed form of the name to the prj at creation time, then comparing against those. This way string comparison only occurs if a hash comparison already succeeded, and it doesn't have to bother lowing the string constantly
Nice find. It might be worth spinning this specific optimization off to its own issue or PR so it can be acted on separately from this discussion.
If I'm reading correctly, does this create a fcfg for every file, for every configuration? If right this is likely a root of our problem
You are right, and nonsense like this is exactly what I'm trying to address with premake-next. I personally don't see a way to fix it without an overhaul of all of the configuration handling internals. Open to ideas and incremental improvements though.
@starkos I'll start experimenting over the next couple of months. As long as things are just expecting fcfgs being passed as function args, and not specifically looking at prj._.files all over, I have a few ideas. I'll get back to you if any of them work!
reviving this old thread 🙄
in my case it seems that the bottleneck is that we have a lot of filters that apply to certain configurations mostly at project level
but each fileconfig will context.copyFilters(fcfg, prj) which means that for each file we will try to match mostly useless filters
is there a way add only filters that could affect files here instead of adding them all ?
Off the top of my head I could think of
language (?)
`filter {"files:" ... }
Are there any other filters that could affect files ?
Upon further inspection, compared to a previous version that i was using 🙄 where we were calling configset.compile 3 times per file, Now It seems we are calling configset.compile 6 times per file
for a root, for solution, for project (?) twice
In oven.bakeFiles
local function addFile(fname, i)
-- If this is the first time I've seen this file, start a new
-- file configuration for it. Track both by key for quick lookups
-- and indexed for ordered iteration.
local fcfg = files[fname]
if not fcfg then
fcfg = p.fileconfig.new(fname, prj) ---->>>> this calls context compile
fcfg.order = i
files[fname] = fcfg
table.insert(files, fcfg)
end
p.fileconfig.addconfig(fcfg, cfg) ---->>>>> this ALSO calls context compile
end
table.foreachi(cfg.files, addFile)
this seems to have been introduced by @tvandijck in commit 32885d1673b50194ad7137261634775d772a4534
further investigation into configset.compile
if abspath and block._basedir and block._basedir ~= basedir then
basedir = block._basedir
filter.files = path.getrelative(basedir, abspath)
end
if criteria.matches(block._criteria, filter) then
table.insert(result.blocks, block)
end
40% of the time is spent on this code
about 20% on path.gerelative
and the rest of 20% of the criteria.matches test
it seems that on windows the block._basedir is made ::tolower in addFilter is some blocks but not in others ?
Adding a print for block._basedir <==> basedir in the above block that makes the files relative, results in
C:/Work/**** <==> c:/work/*****
c:/work/**** <==> C:/Work/*****
Actual filenames are omitted for brevity :) Questions is do we still need to make the basedir to lower @starkos ? or are some other places where we are missing a tolower ? Also would tolower cause issues with case sensitive filesystems ?
after even further investigation it seems this problem comes from the export plugin
block._basedir = block.basedir
making that tolower reduces the cost of path.getrelative