mousebender
mousebender copied to clipboard
Create `mousebender.resolve`
Use resolvelib to keep compatibility with other projects like pip and PDM.
Provide flexibility in how the resolution is handled:
- [x] Separate out how metadata is fetched (so you can swap out either the resolution or download logic; also makes testing easier)
- [x] Track which projects brought in which dependencies (to be able to render the dependency graph)
- [x] Newest or oldest version (to know when your lower bounds no longer work)
- [x] Pass in platform and marker details (so you can resolve for any platform on any machine)
- [x] Fastest or most compatible wheel (so you can see what the loosest wheel tag triple you need is)
- [ ] Support external constraints (since it seems to be a common need)
Provider protocol
- [x]
identify()
- [x]
get_preference()
- [x]
find_matches()
- [x]
is_satisfied_by()
- [x]
get_dependencies()
Code links:
A couple of relevant sections from existing standards:
- https://peps.python.org/pep-0440/#summary-of-permitted-suffixes-and-relative-ordering (covered by
Version
FWIW) - https://peps.python.org/pep-0592/#installers
The identify()
method seems to be about generating a key for a requirement (typing suggests it can be a candidate or requirement, but the docstring only mentions requirements):
Given a requirement, return an identifier for it.
This is used to identify a requirement, e.g. whether two requirements should have their specifier parts merged.
So the important thing here seems to be having requirements that are for the same thing to line up for merging.
Apparently everyone makes extras part of key; pip, PDM, and the extras example from resolvelib. What's interesting is PDM and pip return a string while the resolvelib example returns a string or tuple of name and extras.
Since the typing of identify()
suggests nothing more than it must be hashable, probably the simplest solution here is to return a tuple of Tuple[NormalizedName, frozenset[NormalizedName]]
for the distribution name and extras. For any requirements that have no extras then the empty set can be used.
Now we just have to figure out the representation of a requirement in order to generate that tuple. 😅
The get_dependencies()
method seems to be about taking a candidate (which is distribution and version) and getting back its dependencies in the form of a requirement:
Get dependencies of a candidate.
This should return a collection of requirements that
candidate
specifies as its dependencies.
Depending on how candidate objects look, this could be very easy or it could require going back to the wheel to get the dependencies.
Since the resolver is the one having to figure out priorities, the dependencies probably don't have a priority order, thus returning something like a set should be okay.
The is_satisfied_by()
method seems to just check if a candidate satisfies a requirement.
Whether the given requirement can be satisfied by a candidate.
The candidate is guaranteed to have been generated from the requirement.
A boolean should be returned to indicate whether
candidate
is a viable solution to the requirement.
Now the interesting thing is "the candidate is guaranteed to have been generated from the requirement", and yet we still have to check if the candidate satisfies the requirement. 🤔 The best I can think of is multiple requirements got merged into a single requirement and this is to check if the candidate is still viable.
The get_preference()
method is all about creating a sort key for the DSU pattern.
The preference is defined as "I think this requirement should be resolved first". The lower the return value is, the more preferred this group of arguments is.
The PyPI wheel example from resolvelib and an open issue about a default implementation suggest just going with the number of candidates as that prioritizes the most restricted distributions first.
Pip's implementation is way more involved. From its docstring:
- Prefer if any of the known requirements is "direct", e.g. points to an explicit URL.
- If equal, prefer if any requirement is "pinned", i.e. contains
operator
===
or==
. - If equal, calculate an approximate "depth" and resolve requirements closer to the user-specified requirements first. If the depth cannot by determined (eg: due to no matching parents), it is considered infinite.
- Order user-specified requirements by the order they are specified.
- If equal, prefers "non-free" requirements, i.e. contains at least one
operator, such as
>=
or<
. - If equal, order alphabetically for consistency (helps debuggability).
PDM's implementation is also more involved. The return value gives a hint as to what it prioritises:
return (
not is_python,
not is_top,
not is_file_or_url,
not is_pinned,
not is_backtrack_cause,
dep_depth,
-constraints,
identifier,
)
It might be best to go with the simple solution as suggested in the open issue and example and increase complexity as necessary.
If get_dependencies()
is candidate -> requirements, the get_matches()
method seems to be requirement -> candidate.
This is seems to be where communicating with e.g. PyPI would come in (as the PyPI wheel example suggests).
This does require returning things in a sorted order of preference.
After all of those notes, it seems like the methods do the following (with a wheel-specific view):
-
identify()
: Key for a dict for a requirement which is consistent based on the distribution and not any constraints -
get_preference()
: Sort requirements based on how important they are to resolve for first -
find_matches()
: Find all the candidates that could satisfy a requirement -
is_satisfied_by()
: Does a candidate still work for a requirement? -
get_dependencies()
: Get the dependencies of a candidate
It seems all the magic in terms of what to customize happens in find_matches()
. I think the tricky bit here is the right abstraction to let things be customized while also helping to minimize network traffic. Do you do subclassing? Composition? Do you ask for everything and let the common code do the filtering of the results, or do you let the thing you delegate to handle that to avoid needless networking requests? Another question is do you return all potential wheels, or just the best one (e.g., compiled wheel and pure Python wheels, or just the compiled one)? If you do all wheels you have the greatest chance at success, but it will also increase resolution complexity. It also then becomes a question of whether you prefer newer versions with only pure Python wheels or older versions with compiled wheels?
It seems like requirements are pretty much what packaging.requirements.Requirement
are. Candidates are details about a specific wheel.
requirements that are for the same thing to line up for merging.
Same for the corresponding candidates -- for every candidate generated by a find_matches(requirement), it needs to have identify(candidate) == identify(requirement)
. Without that, the resolver cannot associate the two.
I don't remember if we added an explicit error check for this in resolvelib, but flagging this since I don't see this nuance mentioned here.
PS: We need to improve that docstring for identify. 😅
Same for the corresponding candidates -- for every candidate generated by a find_matches(requirement), it needs to have
identify(candidate) == identify(requirement)
. Without that, the resolver cannot associate the two.
How does that play into extras? The extras example says:
To model projects with extras, we define a candidate as being a project with a specific set of dependencies. This introduces a problem, as the resolver could produce a solution that demands version 1.0 of X[foo] and version 2.0 of X[bar]. This is impossible, as there is actually only one project X to be installed. To address this, we inject an additional dependency for every candidate with an extra - X[foo] version v depends on X version v. By doing this, we constrain the solution to require a unique version of X.
So does that mean you have the same candidate wheel under multiple identifiers to satisfy any and all extras that are possible for a distribution (and thus cache a lot to avoid the cost of looking up the same thing for a distribution just because you now come across a new set of extras)? Or do you add a dependency to the distribution+extra of the distribution itself and have a virtual candidate that always satisfies the distribution+extra so the resolver effectively ignores the distribution+extra node of the dependency graph?
PS: We need to improve that docstring for identify. 😅
https://github.com/sarugaku/resolvelib/pull/138
So does that mean you have the same candidate wheel under multiple identifiers to satisfy any and all extras that are possible for a distribution (and thus cache a lot to avoid the cost of looking up the same thing for a distribution just because you now come across a new set of extras)?
For pip, yes-ish (-ish because I don't recall the caching details). The extra candidate is based on another candidate and the way that this is handled is by having the extra candidate depend on a requirement that only find_matches
with the specific candidate for the underlying distribution file which the metadata was pulled from.
For something like the following...
[project]
name = "home"
version = "1.0.0"
[project.optional-dependencies]
base = ["base-dependency"]
dog = ["dog-food", "dog-bed"]
cat = ["cat-food", "cat-bed"]
And doing a home[base, dog]
and home[base, cat]
will give resolvelib a graph that looks something like:
graph LR
subgraph legend
requirement(["requirement"])
candidate[["candidate"]]
end
subgraph cat-extra["identity = home[base,cat]"]
home-cat-requirement(["home[base,cat]"])
home-cat-version-one[["ExtrasCandidate(\n LinkCandidate(home-1.0.0-*.whl),\n [base,cat]\n)"]]
style home-cat-version-one text-align:left
end
subgraph dog-extra["identity = home[base,dog]"]
home-dog-requirement(["home[base,dog]"])
home-dog-version-one[["ExtrasCandidate(\n LinkCandidate(home-1.0.0-*.whl),\n [base,dog]\n)"]]
style home-dog-version-one text-align:left
end
subgraph cat-requirements ["layout grouping\ncat extra dependencies"]
cat-requirement-food(["cat-food"])
cat-requirement-bed(["cat-bed"])
end
subgraph base-requirements ["layout grouping\nbase extra dependencies"]
base-requirement(["base-dependency"])
end
subgraph home-one ["identity = home"]
home-requirement-one(["ExplicitRequirement(LinkCandidate(home-1.0.0-*.whl))"])
home-version-one[["LinkCandidate(home-1.0.0-*.whl)"]]
end
subgraph dog-requirements ["layout grouping\ndog extra dependencies"]
dog-requirement-food(["dog-food"])
dog-requirement-bed(["dog-bed"])
end
home-dog-requirement --> home-dog-version-one;
home-cat-requirement --> home-cat-version-one;
home-dog-version-one --> home-requirement-one;
home-dog-version-one --> base-requirement;
home-cat-version-one --> home-requirement-one;
home-cat-version-one --> base-requirement;
home-requirement-one --> home-version-one;
home-cat-version-one --> cat-requirement-food;
home-cat-version-one --> cat-requirement-bed;
home-dog-version-one --> dog-requirement-food;
home-dog-version-one --> dog-requirement-bed;
And, yes, I've not expanded the requirements that came from the extras into their corresponding candidates because that isn't relevant to the extra handling question.
(A home 2.0.0 will add a new node corresponding to those for 1.0.0, with basically the same connections if the dependencies are unchanged. Adding that in made mermaid render the graph in an unreadable format so I've skipped it)
Edit: Updated to explicitly reference the pip-side class object names.
@pradyunsg thanks for this! So if I understand this correctly, there are pseudo-candidates per extras set and distro version which ends up with an implicit requirement to the distribution itself -- i.e. no extras -- with a ==
to the same version the pseudo-candidate specified.
So if I understand this correctly, there are pseudo-candidates per extras set and distro version which ends up with an implicit requirement to the distribution itself -- i.e. no extras -- with a
==
to the same version the pseudo-candidate specified.
Yes, except that it's not done via ==
but as an explicit requirement that matches the candidate. This matters for things like:
>>> SpecifierSet("==1.0").contains("1.0+local_version_label")
True
Where the 1.0
wheel can be different from 1.0+local_version_label
wheel.
I'm realising that this can (and does) lead to the resolver backtracking on the wrong requirement sometimes, since backtracking when an ExplicitRequirement is involved is going to result in suboptimal backtracking. The fix there is to prefer to backtrack on ExplicitRequirement
whenever possible.
backtracking when an ExplicitRequirement is involved is going to result in suboptimal backtracking. The fix there is to prefer to backtrack on
ExplicitRequirement
whenever possible.
That sounds contradictory. If backtracking when an explicit requirement is involved is suboptimal, why is the fix to prefer backtracking in that case? Are you saying explicit requirements should come earlier or later in the ordering provided by get_preferences()
(which is what I assume you're suggesting as the mechanism to signal what to backtrack on)?
And now I'm seeing why https://github.com/sarugaku/resolvelib/issues/14 exists. 😅
Are you saying explicit requirements should come earlier or later in the ordering provided by
get_preferences()
(which is what I assume you're suggesting as the mechanism to signal what to backtrack on)?
Yea -- the preference for backtracking on the ExplicitRequirement
should be higher that other kinds of requirements, when it is implicated in a conflict.
Started outlining the code in the 'resolve' branch.
The resolver now works! I haven't done constraints, but it works well enough that I have a PoC resolver that works with PyPI metadata that can be directly downloaded.
If anyone has feedback on the high-level API I would appreciate it since that's the bit that can't be easily changed later.
Do you mind to briefly explain how the PoC differs from the resolver in pip?
Do you mind to briefly explain how the PoC differs from the resolver in pip?
At minimum, the blatant differences are:
- No support for sdists
- Not support for constraints
There's probably other subtleties that I'm not aware of.