build: iterative algo for transitive dep resolve
In large repositories, transitive dependency resolution using the current recursive algorithm can result in enough duplicate calls to cause the full system memory space to be used up.
This commit converts get_transitive_link_deps to an iterative algorithm that only performs work on each target once, resulting in multiple orders of magnitude of improvements to dep resolution time and memory usage in large projects.
This algorithm also has the side effect of returning a deduplicated list of deps to avoid doing redundant work in functions that call get_transitive_link_deps. It has been verified to produce the same output as the original version of the function when the latter's output is deduplicated and both are sorted.
If possible, I would be very appreciative if this could be included in the 1.5.0 release, but I know that the RC has already been tagged and I understand if this isn't possible. :sweat_smile:
There are a bunch of recursive functions there, and I also identified them as being extremely slow on large dependecy trees. Python is remarkably bad at recursion, I profiled that a while ago. However, we need to be extremely careful here, because that code is many order of magnitude more complicated than you could think. I personally banged my head on the wall for weeks in those recursive functions and they still have issues.
One thing I notice in this PR is that get_all_link_deps() is not called anymore but that method is overriden in different classes. Depending on the type of objects it recurse differently.
There are a bunch of recursive functions there, and I also identified them as being extremely slow on large dependecy trees. Python is remarkably bad at recursion, I profiled that a while ago. However, we need to be extremely careful here, because that code is many order of magnitude more complicated than you could think. I personally banged my head on the wall for weeks in those recursive functions and they still have issues.
One thing I notice in this PR is that
get_all_link_deps()is not called anymore but that method is overriden in different classes. Depending on the type of objects it recurse differently.
Yep, and that reflects in the unit tests... I'll keep hacking at this and see if I can make it work.
I also feel obligated to point out that in at least some cases get_transitive_link_deps is subtly broken, see a couple of the commits in https://github.com/mesonbuild/meson/pull/12539/ to see what I mean
I also feel obligated to point out that in at least some cases
get_transitive_link_depsis subtly broken, see a couple of the commits in #12539 to see what I mean
Right, it miss recursing over self.link_whole_targets. My understanding is the goal of that function is to return all shared libraries in the entire dep tree, to set PATH on Windows and LD_LIBRARY_PATH on linux. That means it needs to skip static libraries but still take shared libraries that are linked by static libraries.
It also missing adding self, so if you have a dependency tree like:
A -> B -> C
-> D
Then the transitive tree will be {C, D}, even though B should be in the tree
It also missing adding
self
SharedLibrary.get_all_link_deps() does add self. That code is scary because the resusing function is different for each target type.
I think I've managed to create a version that should cover all subclasses of Target - I added a get_toplevel_link_deps function that returns the same value as get_all_link_deps per class, but doesn't recurse. Testing it now.
What about this?
def get_all_link_deps(self) -> ImmutableListProtocol[BuildTargetTypes]:
result: T.List[Target] = []
stack: T.Deque[Target] = deque()
stack.append(self)
while stack:
t = stack.pop()
if t.links_dynamically():
result.append(t)
stack.extendleft(self.link_targets)
stack.extendleft(self.link_whole_targets)
return result
What about this?
def get_all_link_deps(self) -> ImmutableListProtocol[BuildTargetTypes]: result: T.List[Target] = [] stack: T.Deque[Target] = deque() stack.append(self) while stack: t = stack.pop() if t.links_dynamically(): result.append(t) stack.extendleft(self.link_targets) stack.extendleft(self.link_whole_targets) return result
This wouldn't work:
- Adds duplicate entries to the stack, which nullifies the vast majority of the performance benefit
- You probably meant
stack.extendleft(t.link_targets)andstack.extendleft(t.link_whole_targets), which adds even more duplicate entries to the stack
The goal of this commit is to only process each target once, which necessitated an iterative version of the original function. Adding all of link_targets and link_whole_targets to the stack for each target in the tree defeats the purpose.
How about this, plus a change to link_whole_targets in the get_toplevel_link_deps impls:
@lru_cache(maxsize=None)
def get_transitive_link_deps(self) -> ImmutableListProtocol[BuildTargetTypes]:
result: OrderedSet[Target] = OrderedSet()
stack: T.Deque[Target] = deque()
stack.appendleft(self)
while stack:
for i in stack.pop().get_toplevel_link_deps():
if i not in result and i.links_dynamically():
result.append(i)
stack.appendleft(i)
return list(result)
How about this, plus a change to link_whole_targets in the get_toplevel_link_deps impls:
@lru_cache(maxsize=None) def get_transitive_link_deps(self) -> ImmutableListProtocol[BuildTargetTypes]: result: OrderedSet[Target] = OrderedSet() stack: T.Deque[Target] = deque() stack.appendleft(self) while stack: for i in stack.pop().get_toplevel_link_deps(): if i not in result and i.links_dynamically(): result.append(i) stack.appendleft(i) return list(result)
This passes tests, so going to push it.
Never mind, that fails because links_dynamically is only defined for custom targets. I'm just going to revert it back to link_targets to match master, because I'd rather avoid any side effects from this.
What about this?
def get_all_link_deps(self) -> ImmutableListProtocol[BuildTargetTypes]: result: T.List[Target] = [] stack: T.Deque[Target] = deque() stack.append(self) while stack: t = stack.pop() if t.links_dynamically(): result.append(t) stack.extendleft(self.link_targets) stack.extendleft(self.link_whole_targets) return resultThis wouldn't work:
* Adds duplicate entries to the stack, which nullifies the vast majority of the performance benefit * You probably meant `stack.extendleft(t.link_targets)` and `stack.extendleft(t.link_whole_targets)`, which adds even more duplicate entries to the stackThe goal of this commit is to only process each target once, which necessitated an iterative version of the original function. Adding all of link_targets and link_whole_targets to the stack for each target in the tree defeats the purpose.
Good points, fixed:
def get_all_link_deps(self) -> ImmutableListProtocol[BuildTargetTypes]:
result: T.List[Target] = []
visited: T.Set[Target] = set()
stack: T.Deque[Target] = deque()
stack.extendleft(self.link_targets)
stack.extendleft(self.link_whole_targets)
while stack:
t = stack.pop()
if t in visited:
continue
visited.add(t)
if isinstance(t, SharedLibrary):
result.append(t)
if isinstance(t, BuildTarget):
stack.extendleft(t.link_targets)
stack.extendleft(t.link_whole_targets)
return result
As stated, given that this PR is meant to be purely performance-oriented, I'm going to keep the function aligned to what master does. I'd be more than happy to open a separate PR after this one is merged that changes the function's behavior to match what you've posted, though.
IMHO it is in any case too dangerous for backport in stable release, or even 1.5. Better do both in a single PR for 1.6.
While at it, there is also get_link_deps_mapping() and get_link_dep_subdirs() that are very similar. I thought I had a PR already somewhere that merge them in a single function, but can't find it.
What I've been playing with so far:
diff --git a/mesonbuild/backend/backends.py b/mesonbuild/backend/backends.py
index c1a72f13d..9e225b070 100644
--- a/mesonbuild/backend/backends.py
+++ b/mesonbuild/backend/backends.py
@@ -1179,14 +1179,14 @@ class Backend:
result: T.Set[str] = set()
prospectives: T.Set[build.BuildTargetTypes] = set()
if isinstance(target, build.BuildTarget):
- prospectives.update(target.get_transitive_link_deps())
+ prospectives.update(target.get_all_link_deps())
# External deps
result.update(self.extract_dll_paths(target))
for bdep in extra_bdeps:
prospectives.add(bdep)
if isinstance(bdep, build.BuildTarget):
- prospectives.update(bdep.get_transitive_link_deps())
+ prospectives.update(bdep.get_all_link_deps())
# Internal deps
for ld in prospectives:
dirseg = os.path.join(self.environment.get_build_dir(), self.get_target_dir(ld))
diff --git a/mesonbuild/build.py b/mesonbuild/build.py
index 071dbdfd4..67cfba5de 100644
--- a/mesonbuild/build.py
+++ b/mesonbuild/build.py
@@ -2,7 +2,7 @@
# Copyright 2012-2017 The Meson development team
from __future__ import annotations
-from collections import defaultdict, OrderedDict
+from collections import defaultdict, OrderedDict, deque
from dataclasses import dataclass, field, InitVar
from functools import lru_cache
import abc
@@ -1067,14 +1067,29 @@ class BuildTarget(Target):
return ExtractedObjects(self, self.sources, self.generated, self.objects,
recursive, pch=True)
+ @lru_cache(maxsize=None)
def get_all_link_deps(self) -> ImmutableListProtocol[BuildTargetTypes]:
- return self.get_transitive_link_deps()
+ """ Get all shared libraries dependencies
- @lru_cache(maxsize=None)
- def get_transitive_link_deps(self) -> ImmutableListProtocol[BuildTargetTypes]:
+ This returns all shared libraries in the entire dependency tree. Those
+ are libraries needed at runtime which is different from the set needed
+ at link time, see get_dependencies() for that.
+ """
result: T.List[Target] = []
- for i in self.link_targets:
- result += i.get_all_link_deps()
+ visited: T.Set[Target] = set()
+ stack: T.Deque[Target] = deque()
+ stack.extendleft(self.link_targets)
+ stack.extendleft(self.link_whole_targets)
+ while stack:
+ t = stack.pop()
+ if t in visited:
+ continue
+ visited.add(t)
+ if isinstance(t, SharedLibrary):
+ result.append(t)
+ if isinstance(t, BuildTarget):
+ stack.extendleft(t.link_targets)
+ stack.extendleft(t.link_whole_targets)
return result
def get_link_deps_mapping(self, prefix: str) -> T.Mapping[str, str]:
@@ -2388,9 +2403,6 @@ class SharedLibrary(BuildTarget):
"""
return self.debug_filename
- def get_all_link_deps(self):
- return [self] + self.get_transitive_link_deps()
-
def get_aliases(self) -> T.List[T.Tuple[str, str, str]]:
"""
If the versioned library name is libfoo.so.0.100.0, aliases are:
@@ -2708,9 +2720,6 @@ class CustomTarget(Target, CustomTargetBase, CommandBase):
def get_link_dep_subdirs(self) -> T.AbstractSet[str]:
return OrderedSet()
- def get_all_link_deps(self):
- return []
-
def is_internal(self) -> bool:
'''
Returns True if this is a not installed static library.
@@ -2957,9 +2966,6 @@ class CustomTargetIndex(CustomTargetBase, HoldableObject):
def get_id(self) -> str:
return self.target.get_id()
- def get_all_link_deps(self):
- return self.target.get_all_link_deps()
-
def get_link_deps_mapping(self, prefix: str) -> T.Mapping[str, str]:
return self.target.get_link_deps_mapping(prefix)
I agree that it shouldn't be backported into stable, but I don't see any reason to avoid backporting this PR as is into 1.5. It's a purely performance oriented fix that prevents catastrophic failure due to OOM and shouldn't have any other effects, and further RCs should find any side effects that may slip through unnoticed.
It's a purely performance oriented fix
That's the theory, but I've learned to be very careful with those recursive functions, maybe I'm overly cautious.
Note that I find it suspicious that you OOM on this, sounds like you have cyclic dependency. That's not the only place we recurse on dependencies, it should OOM there too.
It's a purely performance oriented fix
That's the theory, but I've learned to be very careful with those recursive functions, maybe I'm overly cautious.
Note that I find it suspicious that you OOM on this, sounds like you have cyclic dependency. That's not the only place we recurse on dependencies, it should OOM there too.
I checked, and there isn't a cyclic dependency there—if there were, this patch wouldn't fix the problem for the other places where dependency recursion is an issue. It's just a huge tangled web of deps. #13342 was also discovered by this build tree.
It's a purely performance oriented fix that prevents catastrophic failure due to OOM and shouldn't have any other effects, and further RCs should find any side effects that may slip through unnoticed.
This PR has already generated a ton of discussion and rewrites, so it's clearly not a simple and straightforward bugfix.
I can understand why @xclaesse would prefer to give it an entire dev cycle to cook.
If the consensus among you all is that it should wait until 1.6, then I'll concede that point. Making it into 1.5 would make me very happy, though.
And in defense of the rewrites, they were partially for code correctness and partially because I'm just not that deep into Python 😅 I think @xclaesse's suggestions should be implemented, I just wanted to keep this PR as focused as possible to increase its chances of getting in before 1.5 is tagged. If that's determined to not be possible, I'm more than happy to update the PR to include those changes.
Now that 1.5.0 has been tagged, what are the thoughts on this? I can implement @xclaesse's suggestions if that's what's desired.
I've rebased on master, applied @xclaesse's patch, and added him as a co-author. If there's anything else you need from me, please let me know.
I have a new version that both passes all tests and more closely aligns with what Xavier saw as a better approach. I'm going to close this PR and open a new one with that implementation.
(it is okay to iterate on an existing PR too, you know. :p)
Ofc! This PR had been through so much that I decided to just do a clean implementation and open a new one on a new branch given how far we've gone from my original PR.