Use resolvelib to keep compatibility with other projects like pip and PDM.

Provide flexibility in how the resolution is handled:

[x] Separate out how metadata is fetched (so you can swap out either the resolution or download logic; also makes testing easier)
[x] Track which projects brought in which dependencies (to be able to render the dependency graph)
[x] Newest or oldest version (to know when your lower bounds no longer work)
[x] Pass in platform and marker details (so you can resolve for any platform on any machine)
[x] Fastest or most compatible wheel (so you can see what the loosest wheel tag triple you need is)
[ ] Support external constraints (since it seems to be a common need)

Provider protocol

Aug 21 '23 23:08 brettcannon

Code links:

Aug 21 '23 23:08 brettcannon

A couple of relevant sections from existing standards:

https://peps.python.org/pep-0440/#summary-of-permitted-suffixes-and-relative-ordering (covered by Version FWIW)
https://peps.python.org/pep-0592/#installers

Sep 01 '23 20:09 pradyunsg

The identify() method seems to be about generating a key for a requirement (typing suggests it can be a candidate or requirement, but the docstring only mentions requirements):

Given a requirement, return an identifier for it.

This is used to identify a requirement, e.g. whether two requirements should have their specifier parts merged.

So the important thing here seems to be having requirements that are for the same thing to line up for merging.

Apparently everyone makes extras part of key; pip, PDM, and the extras example from resolvelib. What's interesting is PDM and pip return a string while the resolvelib example returns a string or tuple of name and extras.

Since the typing of identify() suggests nothing more than it must be hashable, probably the simplest solution here is to return a tuple of Tuple[NormalizedName, frozenset[NormalizedName]] for the distribution name and extras. For any requirements that have no extras then the empty set can be used.

Now we just have to figure out the representation of a requirement in order to generate that tuple. 😅

Sep 02 '23 18:09 brettcannon

The get_dependencies() method seems to be about taking a candidate (which is distribution and version) and getting back its dependencies in the form of a requirement:

Get dependencies of a candidate.

This should return a collection of requirements that candidate specifies as its dependencies.

Depending on how candidate objects look, this could be very easy or it could require going back to the wheel to get the dependencies.

Since the resolver is the one having to figure out priorities, the dependencies probably don't have a priority order, thus returning something like a set should be okay.

Sep 02 '23 22:09 brettcannon

The is_satisfied_by() method seems to just check if a candidate satisfies a requirement.

Whether the given requirement can be satisfied by a candidate.

The candidate is guaranteed to have been generated from the requirement.

A boolean should be returned to indicate whether candidate is a viable solution to the requirement.

Now the interesting thing is "the candidate is guaranteed to have been generated from the requirement", and yet we still have to check if the candidate satisfies the requirement. 🤔 The best I can think of is multiple requirements got merged into a single requirement and this is to check if the candidate is still viable.

Sep 02 '23 22:09 brettcannon

The get_preference() method is all about creating a sort key for the DSU pattern.

The preference is defined as "I think this requirement should be resolved first". The lower the return value is, the more preferred this group of arguments is.

The PyPI wheel example from resolvelib and an open issue about a default implementation suggest just going with the number of candidates as that prioritizes the most restricted distributions first.

Pip's implementation is way more involved. From its docstring:

Prefer if any of the known requirements is "direct", e.g. points to an explicit URL.
If equal, prefer if any requirement is "pinned", i.e. contains operator === or ==.
If equal, calculate an approximate "depth" and resolve requirements closer to the user-specified requirements first. If the depth cannot by determined (eg: due to no matching parents), it is considered infinite.
Order user-specified requirements by the order they are specified.
If equal, prefers "non-free" requirements, i.e. contains at least one operator, such as >= or <.
If equal, order alphabetically for consistency (helps debuggability).

PDM's implementation is also more involved. The return value gives a hint as to what it prioritises:

return (
            not is_python,
            not is_top,
            not is_file_or_url,
            not is_pinned,
            not is_backtrack_cause,
            dep_depth,
            -constraints,
            identifier,
        )

It might be best to go with the simple solution as suggested in the open issue and example and increase complexity as necessary.

Sep 02 '23 23:09 brettcannon

If get_dependencies() is candidate -> requirements, the get_matches() method seems to be requirement -> candidate.

This is seems to be where communicating with e.g. PyPI would come in (as the PyPI wheel example suggests).

This does require returning things in a sorted order of preference.

Sep 02 '23 23:09 brettcannon

After all of those notes, it seems like the methods do the following (with a wheel-specific view):

identify(): Key for a dict for a requirement which is consistent based on the distribution and not any constraints
get_preference(): Sort requirements based on how important they are to resolve for first
find_matches(): Find all the candidates that could satisfy a requirement
is_satisfied_by(): Does a candidate still work for a requirement?
get_dependencies(): Get the dependencies of a candidate

It seems all the magic in terms of what to customize happens in find_matches(). I think the tricky bit here is the right abstraction to let things be customized while also helping to minimize network traffic. Do you do subclassing? Composition? Do you ask for everything and let the common code do the filtering of the results, or do you let the thing you delegate to handle that to avoid needless networking requests? Another question is do you return all potential wheels, or just the best one (e.g., compiled wheel and pure Python wheels, or just the compiled one)? If you do all wheels you have the greatest chance at success, but it will also increase resolution complexity. It also then becomes a question of whether you prefer newer versions with only pure Python wheels or older versions with compiled wheels?

It seems like requirements are pretty much what packaging.requirements.Requirement are. Candidates are details about a specific wheel.

Sep 03 '23 00:09 brettcannon

requirements that are for the same thing to line up for merging.

Same for the corresponding candidates -- for every candidate generated by a find_matches(requirement), it needs to have identify(candidate) == identify(requirement). Without that, the resolver cannot associate the two.

I don't remember if we added an explicit error check for this in resolvelib, but flagging this since I don't see this nuance mentioned here.

PS: We need to improve that docstring for identify. 😅

Sep 03 '23 11:09 pradyunsg

Same for the corresponding candidates -- for every candidate generated by a find_matches(requirement), it needs to have identify(candidate) == identify(requirement). Without that, the resolver cannot associate the two.

How does that play into extras? The extras example says:

To model projects with extras, we define a candidate as being a project with a specific set of dependencies. This introduces a problem, as the resolver could produce a solution that demands version 1.0 of X[foo] and version 2.0 of X[bar]. This is impossible, as there is actually only one project X to be installed. To address this, we inject an additional dependency for every candidate with an extra - X[foo] version v depends on X version v. By doing this, we constrain the solution to require a unique version of X.

So does that mean you have the same candidate wheel under multiple identifiers to satisfy any and all extras that are possible for a distribution (and thus cache a lot to avoid the cost of looking up the same thing for a distribution just because you now come across a new set of extras)? Or do you add a dependency to the distribution+extra of the distribution itself and have a virtual candidate that always satisfies the distribution+extra so the resolver effectively ignores the distribution+extra node of the dependency graph?

Sep 08 '23 21:09 brettcannon

PS: We need to improve that docstring for identify. 😅

https://github.com/sarugaku/resolvelib/pull/138

Sep 08 '23 22:09 brettcannon

So does that mean you have the same candidate wheel under multiple identifiers to satisfy any and all extras that are possible for a distribution (and thus cache a lot to avoid the cost of looking up the same thing for a distribution just because you now come across a new set of extras)?

For pip, yes-ish (-ish because I don't recall the caching details). The extra candidate is based on another candidate and the way that this is handled is by having the extra candidate depend on a requirement that only find_matches with the specific candidate for the underlying distribution file which the metadata was pulled from.

For something like the following...

[project]
name = "home"
version = "1.0.0"

[project.optional-dependencies]
base = ["base-dependency"]
dog = ["dog-food", "dog-bed"]
cat = ["cat-food", "cat-bed"]

And doing a home[base, dog] and home[base, cat] will give resolvelib a graph that looks something like:

graph LR
    subgraph legend
        requirement(["requirement"])
        candidate[["candidate"]]
    end
    subgraph cat-extra["identity = home[base,cat]"]
        home-cat-requirement(["home[base,cat]"])
        home-cat-version-one[["ExtrasCandidate(\n  LinkCandidate(home-1.0.0-*.whl),\n  [base,cat]\n)"]]
        style home-cat-version-one text-align:left
    end

    subgraph dog-extra["identity = home[base,dog]"]
        home-dog-requirement(["home[base,dog]"])
        home-dog-version-one[["ExtrasCandidate(\n  LinkCandidate(home-1.0.0-*.whl),\n  [base,dog]\n)"]]
        style home-dog-version-one text-align:left
    end

    subgraph cat-requirements ["layout grouping\ncat extra dependencies"]
        cat-requirement-food(["cat-food"])
        cat-requirement-bed(["cat-bed"])
    end

    subgraph base-requirements ["layout grouping\nbase extra dependencies"]
        base-requirement(["base-dependency"])
    end
    subgraph home-one ["identity = home"]
        home-requirement-one(["ExplicitRequirement(LinkCandidate(home-1.0.0-*.whl))"])
        home-version-one[["LinkCandidate(home-1.0.0-*.whl)"]]
    end
    subgraph dog-requirements ["layout grouping\ndog extra dependencies"]
        dog-requirement-food(["dog-food"])
        dog-requirement-bed(["dog-bed"])
    end


    home-dog-requirement --> home-dog-version-one;
    home-cat-requirement --> home-cat-version-one;

    home-dog-version-one --> home-requirement-one;
    home-dog-version-one --> base-requirement;
    home-cat-version-one --> home-requirement-one;
    home-cat-version-one --> base-requirement;

    home-requirement-one --> home-version-one;

    home-cat-version-one --> cat-requirement-food;
    home-cat-version-one --> cat-requirement-bed;

    home-dog-version-one --> dog-requirement-food;
    home-dog-version-one --> dog-requirement-bed;

And, yes, I've not expanded the requirements that came from the extras into their corresponding candidates because that isn't relevant to the extra handling question.

(A home 2.0.0 will add a new node corresponding to those for 1.0.0, with basically the same connections if the dependencies are unchanged. Adding that in made mermaid render the graph in an unreadable format so I've skipped it)

Edit: Updated to explicitly reference the pip-side class object names.

Sep 10 '23 00:09 pradyunsg

@pradyunsg thanks for this! So if I understand this correctly, there are pseudo-candidates per extras set and distro version which ends up with an implicit requirement to the distribution itself -- i.e. no extras -- with a == to the same version the pseudo-candidate specified.

Sep 10 '23 01:09 brettcannon

So if I understand this correctly, there are pseudo-candidates per extras set and distro version which ends up with an implicit requirement to the distribution itself -- i.e. no extras -- with a == to the same version the pseudo-candidate specified.

Yes, except that it's not done via == but as an explicit requirement that matches the candidate. This matters for things like:

>>> SpecifierSet("==1.0").contains("1.0+local_version_label") 
True

Where the 1.0 wheel can be different from 1.0+local_version_label wheel.

I'm realising that this can (and does) lead to the resolver backtracking on the wrong requirement sometimes, since backtracking when an ExplicitRequirement is involved is going to result in suboptimal backtracking. The fix there is to prefer to backtrack on ExplicitRequirement whenever possible.

Sep 10 '23 09:09 pradyunsg

backtracking when an ExplicitRequirement is involved is going to result in suboptimal backtracking. The fix there is to prefer to backtrack on ExplicitRequirement whenever possible.

That sounds contradictory. If backtracking when an explicit requirement is involved is suboptimal, why is the fix to prefer backtracking in that case? Are you saying explicit requirements should come earlier or later in the ordering provided by get_preferences() (which is what I assume you're suggesting as the mechanism to signal what to backtrack on)?

Sep 10 '23 17:09 brettcannon

And now I'm seeing why https://github.com/sarugaku/resolvelib/issues/14 exists. 😅

Sep 10 '23 17:09 brettcannon

Are you saying explicit requirements should come earlier or later in the ordering provided by get_preferences() (which is what I assume you're suggesting as the mechanism to signal what to backtrack on)?

Yea -- the preference for backtracking on the ExplicitRequirement should be higher that other kinds of requirements, when it is implicated in a conflict.

Sep 13 '23 00:09 pradyunsg

Started outlining the code in the 'resolve' branch.

Oct 13 '23 02:10 brettcannon

The resolver now works! I haven't done constraints, but it works well enough that I have a PoC resolver that works with PyPI metadata that can be directly downloaded.

If anyone has feedback on the high-level API I would appreciate it since that's the bit that can't be easily changed later.

Feb 11 '24 22:02 brettcannon

Do you mind to briefly explain how the PoC differs from the resolver in pip?

Feb 11 '24 23:02 ofek

Do you mind to briefly explain how the PoC differs from the resolver in pip?

At minimum, the blatant differences are:

No support for sdists
Not support for constraints

There's probably other subtleties that I'm not aware of.

Feb 13 '24 00:02 brettcannon

Aug 12 '24 00:08 AhmTaher

mousebender
mousebender copied to clipboard

Create `mousebender.resolve`

Provider protocol

mousebender mousebender copied to clipboard

Create `mousebender.resolve`

Provider protocol

mousebender
mousebender copied to clipboard