pip Prefer upper bounds when resolving/backtracking

Fixes: https://github.com/pypa/pip/issues/12993 Fixes: https://github.com/pypa/pip/issues/12990 Fixes: https://github.com/pypa/pip/issues/12430 Fixes: https://github.com/pypa/pip/issues/13030

This PR is built on top of https://github.com/pypa/pip/pull/12982 so that the unit tests can be expanded, either that PR can be reviewed first, or this PR can supplant that PR.

I have developed some benchmark scripts to ensure that changes to pip's resolution algorithm don't regress common real world requirements: https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks.

I plan to keep building out more scenarios, you can see the current ones so far here: https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks/tree/main/scenarios

Upon testing this PR compared to pip 24.2 I see one small regressions and two big improvements:

Difference for scenario scenarios/problematic.toml - autogluon:
    	Success: False -> True.
    	Failure Reason: Build Failure -> None.

Difference for scenario scenarios/problematic.toml - boto3-urllib3-transient:
    	Number of packages processed: 869 -> 871

Difference for scenario scenarios/big-packages.toml - apache-airflow-all:
    	Number of requirements processed: 593 -> 592
    	Number of packages processed: 681 -> 661

The fact that autogluon can resolve is a big improvement, apache-airflow[all] gets a noticeable improvement in how many packages it has to process (and this has real time improvement, as the number of packages processed can have O(n^2) complexity) , and a scenario involving boto3 and urllib3 as transient requirements gets a small regression in having to process 2 more packages.

I am hoping to find more real world scenarios where this has a noticeable difference, but I think these results are sufficient to show this approach is a net positive.

Oct 14 '24 05:10 notatallshaw

Very tentatively adding this to the 24.3 milestone on the basis of:

If a maintainer with resolver experience can look at https://github.com/pypa/pip/pull/12982 then this PR only adds a small amount of functional code on top: https://github.com/pypa/pip/pull/13017/commits/70f4d92e7e5505bb158992116980a9c46d0db6f0
This expands the unit tests in that PR to the functional code in this PR
This is backed up as not regressing against a number of scenarios
It has a real world issue it fixes

But I understand if no maintainer is available to review.

Oct 14 '24 06:10 notatallshaw

Added more problematic scenarios in: https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks/blob/main/scenarios/problematic.toml

And found this also fixes https://github.com/pypa/pip/issues/12430 (which was merged into another issue, but the specific resolution the user had is now solved by this).

Oct 15 '24 00:10 notatallshaw

I do not know pip resiolution internals - but the rules explained make sense and might improve a number of cases indeed.

Oct 15 '24 01:10 potiuk

I took a look to see whether it made any difference to put upper bound preference above or below backtracking cause preference, and at least in the scenarios I currently have in https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks/blob/main/scenarios it didn't make any significant difference (there was a very slight regression of apache-airflow-beam putting it below, as it visited 1 extra package).

So I consider this good in its current position, and if I find a scenario in the future, or a user reports one, where it does make a significant difference, then it can be changed.

Oct 18 '24 14:10 notatallshaw

Found a minor improvement, in acryl-datahub[all] which has over 300 total dependencies, it visited 1 less requirement, 6 less packages, and produced a slightly better solution: https://github.com/notatallshaw/Pip-Resolution-Scenarios-and-Benchmarks/pull/2#issuecomment-2425017792

Oct 20 '24 15:10 notatallshaw

While this looks very reasonable I'd prefer to have another resolver expert (which I am not, unfortunately) to look into this. So postponing.

Oct 26 '24 08:10 sbidoul

While this looks very reasonable I'd prefer to have another resolver expert (which I am not, unfortunately) to look into this. So postponing.

I knew this one was pretty unlikely but I thought I'd give it a shot since the recent real world issues raised that this solves.

Oct 26 '24 16:10 notatallshaw

Going to make a single follow up PR once https://github.com/pypa/pip/pull/13001 lands, I'll comment here once done.

Nov 10 '24 20:11 notatallshaw