pex
pex copied to clipboard
Implement lock file support (Umbrella).
Pex is generally used to produce an application binary and these traditionally get a lockfile in ecosystems that support them in order to allow reproducible builds of the binary at later dates. Even when Pex is used for other purposes, consumers - namely Pants - have a desire to be able to lock resolves done via Pex. This - in fact - is the primary motivation here.
For the motivating Pants case - and in general - Pex should be able to both produce a lock and consume one. The most robust lock would include all the information needed to exactly reproduce first (user source code via -D
and runtime options encoded in PEX-INFO
), second? (the Pex runtime packaged in .bootstrap/
and third party code (the distributions installed in .deps/
). The motivating case from Pants only cares about third party locking which is the most complex case; so the rest just describes the requirements for that.
- The lock should contain enough information to ensure a resolve is bit for bit identical to a prior resolve using the same lock (on the same machine - see requirement 2).
- There should be a mode to produce a lock that works on any machine and and under any Python compatible with the
--python
/--interpreter-constraint
/--platform
combo used to build the Pex given a single random Pex-compatible interpreter. - The lock file should be external to a PEX file so it can be saved separately (If you have a PEX file in hand already - it is a locked resolve and can just be copied to reproduce its resolve!).
That's about it for hard requirements. It probably makes sense for the lock to be human readable, but that's clearly not required in any way. All known prior art does this though (Cargo, npm, Pipfile, Poetry, PDM, ...). It probably makes sense to keep the nascent PEP-665 in mind. All that's actually needed though is to output a requirements file using --hash
and appropriate environment markers to achieve 1-3 above *
. Pip will then do the rest and ensure all resolved dists are bit-for-bit identical.
*
1 is actually not achievable in a guaranteed way once you allow 2 fwict. 1 should nearly always still hold, but technically, you could have a resolve given a certain version of Pip one day and then a new version of Pip the next and as long as the new version only includes distributions locked by the 1st version of Pip, the resolve will complete successfully. That new resolve though could include more or less or just plain different subsets of the 2-style lockfile though and you have no way of knowing except by comparing the results of the resolve by hand. To underscore the issue - you don't even need two version of Pip, you could just have certain dists deleted from PyPI between resolves 1 and 2 and the 1st time you get a platform specific wheel for lxml
and the next time you get the slower - different code - pure python lxml
- say.
Some gotchas to avoid in either the initial implementation or with follow-ups:
- Bifurcated resolves: https://github.com/python-poetry/poetry/issues/4381
- Environment marker explosion: https://github.com/pdm-project/pdm/issues/449
There is one known impossible to handle case: when a #2-style ("platform agnostic") resolve needs to traverse an sdist. The sdist may require being (partially) built to extract python version and dependency metadata (e.g.: executing python setup.py egg_info
). If the python needed by the setup.py
does not exist on the machine generating the lock the lock must fail. This should be a rare problem since it seems ~all modern sdists already have PKG-INFO
in them which contains that metadata and can simply be read.
This work seems separable into the following task graph:
- Platform dependent locks with requirements.txt compatible output: #1401
- a. Platform agnostic locks with requirements.txt compatible output: #1402 | b. PEP-665 compatible output: #1403
- a. Bifurcated resolve handling: #1404 | b. Environment marker explosion handling: #1405
Of these tasks, Pants only needs 1, 2a and 3a *
.
*
Afaict Pants doesn't actually need all of 3a, it needs a subset of 3a where the bifurcation is in the top-level requirements themselves, not in interior nodes. That said, handling all of 3a is needed for correctness in all locks so it should probably be implemented fully.
I think with the addition of --project
in #2455 released in https://github.com/pex-tool/pex/releases/tag/v2.8.0, this issue and the associated project can be closed.