pex
pex copied to clipboard
Implement support for incremental lock resolves.
The basic idea here is to add a new --lock <PATH> flag to pex3 lock create and use the lock specified for pex3 lock update <PATH> to implement faster resolves by:
- Create a venv from the existing lock that includes the configured Pip version.
- Instead of performing an isolated
pip download --log ..., perform apip install --login the venv created in step 1. - Merge the changes recorded in the pip log to the original lock file.
It turns out this is a good deal faster than performing isolated pip downloads.
The example from #2036 shows ~3x speedup with a warm re-lock today taking 22s vs a warm venv + pip install taking ~7s. This factor will likely drop closer to 2.5 once the additional overheads of processing lock diffs are added in, but it seems likely this will still net a significant win.
Throwing 2 infos into the mix:
- PEP 658 exists, and ought to make this kind of thing faster. However the current code of using
pip downloadto download then extract the metadata doesn't allow Pex to leverage (directly or indirectly) the new information. I don't think leveraging PEP 658 is really possible from the API we're consuming frompip. pip install --dry-run --reportgives a nice report containing _almost) everything that goes into a Pex lockfile today (it's only missing hashes from VCS reqs). This has 3 niceties:- Almost all code responsibility is thrown over the fence to
pip - It is internally able to leverage PEP 658
- You can run it inside a venv. Therefore I think in your list of steps, 1 stays. 2 and 3 get changed (if possible)
- Almost all code responsibility is thrown over the fence to
Note that this ticket specifically is about incremental lock resolves. Using --report would speed up fresh lockfile installs as well.
On this 1st bullet point, a typical resolve process with backtracks etc, might visit, say 10k nodes and the final solution set have 100 nodes. With Pip supporting PEP 658 and Pex supporting that pip, 99900 downloads are saved, 100 performed at the end. So there is a download savings by using a dry run report, but a relevant question is the 10k vs 100 - what is the typical savings - those numbers are for illustration and surely inaccurate. The other consideration is that the final set downloads are actually needed anyway in common workflows, so the time is not completely wasted. The pip download is serial vs a lock download later which is parallel, so there is definitely that.