resolvelib icon indicating copy to clipboard operation
resolvelib copied to clipboard

For n packages there are O(n^2) calls to `_is_current_pin_satisfying`

Open notatallshaw opened this issue 1 year ago • 4 comments

As the number of packages to resolve increases there appears to be an O(n2) calls to _is_current_pin_satisfying and I think, at least in some circumstances, it's possible to optimize this away.

I am using steps to reproduce from https://github.com/pypa/pip/issues/12320 and the branch from this PR to avoid network calls in the call graph https://github.com/pypa/pip/pull/12327 and not have O(n2) calls to Pip collecting packages from local directory. In the future I plan to be able to show this performance issue using https://github.com/pradyunsg/pip-resolver-benchmarks

In this example there are ~1300 packages to install and _is_current_pin_satisfying is called ~2.5 million times, I generated the call graph using cProfile and gprof2dot:

output

I have not yet investigated how one might optimize this.

This is motivated from a real world usage report: https://github.com/pypa/pip/issues/12314.

notatallshaw avatar Dec 02 '23 19:12 notatallshaw

An idea I will investigate if no one else pipes in, but maybe it's possible to keep track of satified and unsatisfied names in the state, rather than needing to recalculate them each time.

notatallshaw avatar Dec 02 '23 21:12 notatallshaw

An initial observation, unsatisfied_names:

unsatisfied_names = [
    key
    for key, criterion in self.state.criteria.items()
    if not self._is_current_pin_satisfying(key, criterion)
]

Is almost always equal to:

self.state.criteria.keys() - self.state.mapping.keys()

Very rarely there is an additional name due to this part of _is_current_pin_satisfying:

all(
    self._p.is_satisfied_by(requirement=r, candidate=current_pin)
    for r in criterion.iter_requirement()
 )

And this is what is the expensive part of the call is, but possible a trick that could be applied is just assume unsatisfied_names = self.state.criteria.keys() - self.state.mapping.keys(), and at completion this approximate unsatisfied names will equal the real unsatisfied names (i.e. it will be 0).

notatallshaw avatar Dec 03 '23 17:12 notatallshaw

FYI I have not made any meaningful progress on this and doesn't expect to any time soon.

I tried avoiding calling _is_current_pin_satisfying in most cases of calculating unsatisfied_names , but I got lots of test failures that I beleive to be valid, it feels like it should be possible so possibly there was just an unidentified bug in my code. I will take a look at this again when I have a chance.

notatallshaw avatar Dec 15 '23 16:12 notatallshaw

FYI, a workaround is for the provider to cache the calls, a PR has been raised on Pip side to do that: https://github.com/pypa/pip/pull/12453

notatallshaw avatar Jan 12 '24 14:01 notatallshaw