pip
pip copied to clipboard
Cache is_satisfied_by
This is a performance improvement for a presumably common use case, namely the installation of a project with locked dependencies using pip install -c requirements.txt -e ., in an environment where most dependencies are already installed.
More precisely the case I'm testing is a ~500 lines requirements.txt, and running the above command in an environment where all dependencies are already installed. This is typically done by developers when pulling a new version of the project, to ensure their environment is up-to-date.
When running this under py-spy, it appears that PipProvider.is_satisfied_by largely dominates the 50 seconds running resolve() (out of a total 55 sec execution time).
Some tracing showed that is_satisfied_by is repeatedly called for the same requirement and candidate.
So I experimented with some caching.
The result of this PR is a 40 seconds saving on the 50 sec resolve(). The total execution time went down to 15 sec from 55.
I'm opening as draft as there are some possible improvements, but it's ready for comments on the general approach already.
The root cause of the perf issue is likely related to https://github.com/sarugaku/resolvelib/issues/147
+1 on the general approach. Making InstallRequirement hashable and comparable in its own right can be a follow-up PR if it's complex (and it looks like it could be complex).
There's a risk of incorrect results caused by two requirements which compare as equal but actually aren't (for example two URL requirements with the same target but different auth data) but I'm assuming that risk is small enough to be acceptable.
What's the contains that dominates is_satisfied_by?
What's the
containsthat dominatesis_satisfied_by?
It's from packaging.specifiers.
The root cause of the perf issue is likely related to sarugaku/resolvelib#147
Yes, resolvelibs algorithm checks each step if each requirement has been satisfied: https://github.com/sarugaku/resolvelib/blob/1.0.1/src/resolvelib/resolvers.py#L217, called here https://github.com/sarugaku/resolvelib/blob/1.0.1/src/resolvelib/resolvers.py#L409 and here https://github.com/sarugaku/resolvelib/blob/1.0.1/src/resolvelib/resolvers.py#L443.
So for 500 requirements, that are not top level, there will be at least 500 steps, where each step will check if each of the 500 requirements have already been resolved, which for yet unpinned requirements involves checking each requirement's requirements leading to at least O(n^2) behavior. Ideally there would be an algorthmic fix for this, but I've not yet found one.
Also I beleive this PR potentially fixes this users issue https://github.com/pypa/pip/issues/12314 (though reproducing the users results has been challenging).
By the way for this specific use case:
common use case, namely the installation of a project with locked dependencies using pip install -c requirements.txt -e .
I do have a seperate idea that could fix it algorithmically: https://github.com/sarugaku/resolvelib/issues/146. But I've not done any work towards a PR yet, so I can't compare.
Is there a reason this is still in draft mode? It would be great to see this land this year.
@notatallshaw thanks for the ping. I think this was ready, but it needs double checking for correctness. So I marked it ready for review. I'll add the news entry soon.
Testing performance today on Python 3.12 with all packages cached (ran the command twice to populate cache) pip install --dry-run apache-airflow[all] took ~5 mins 50 seconds on main and ~3 mins 10 seconds on this branch.
This is a huge performance improvement for real world large dependency trees.
@pradyunsg I tentatively added this to 24.1. Feel free to postpone if you are not comfortable with this.
There will be minor conflict with #12300, which I'll be happy to resolve, whichever of the two is merged first.
Oh, wait KeyBasedCompareMixin I was relying on here has been removed in https://github.com/pypa/pip/pull/12571. I'm resetting to draft.
Sorry about that @sbidoul! I suppose in hindsight it would've been better to not remove that mixin.
No worries. I already pushed the update.
Thanks for the review and merge @uranusjr.
I suppose in hindsight it would've been better to not remove that mixin.
@ichard26 no issue at all. Dead code must be removed without mercy ;)