pex icon indicating copy to clipboard operation
pex copied to clipboard

Implement lock file support (Umbrella).

Open jsirois opened this issue 3 years ago • 3 comments

Pex is generally used to produce an application binary and these traditionally get a lockfile in ecosystems that support them in order to allow reproducible builds of the binary at later dates. Even when Pex is used for other purposes, consumers - namely Pants - have a desire to be able to lock resolves done via Pex. This - in fact - is the primary motivation here.

For the motivating Pants case - and in general - Pex should be able to both produce a lock and consume one. The most robust lock would include all the information needed to exactly reproduce first (user source code via -D and runtime options encoded in PEX-INFO), second? (the Pex runtime packaged in .bootstrap/ and third party code (the distributions installed in .deps/). The motivating case from Pants only cares about third party locking which is the most complex case; so the rest just describes the requirements for that.

  1. The lock should contain enough information to ensure a resolve is bit for bit identical to a prior resolve using the same lock (on the same machine - see requirement 2).
  2. There should be a mode to produce a lock that works on any machine and and under any Python compatible with the --python / --interpreter-constraint / --platform combo used to build the Pex given a single random Pex-compatible interpreter.
  3. The lock file should be external to a PEX file so it can be saved separately (If you have a PEX file in hand already - it is a locked resolve and can just be copied to reproduce its resolve!).

That's about it for hard requirements. It probably makes sense for the lock to be human readable, but that's clearly not required in any way. All known prior art does this though (Cargo, npm, Pipfile, Poetry, PDM, ...). It probably makes sense to keep the nascent PEP-665 in mind. All that's actually needed though is to output a requirements file using --hash and appropriate environment markers to achieve 1-3 above *. Pip will then do the rest and ensure all resolved dists are bit-for-bit identical.

* 1 is actually not achievable in a guaranteed way once you allow 2 fwict. 1 should nearly always still hold, but technically, you could have a resolve given a certain version of Pip one day and then a new version of Pip the next and as long as the new version only includes distributions locked by the 1st version of Pip, the resolve will complete successfully. That new resolve though could include more or less or just plain different subsets of the 2-style lockfile though and you have no way of knowing except by comparing the results of the resolve by hand. To underscore the issue - you don't even need two version of Pip, you could just have certain dists deleted from PyPI between resolves 1 and 2 and the 1st time you get a platform specific wheel for lxml and the next time you get the slower - different code - pure python lxml - say.

jsirois avatar Aug 16 '21 19:08 jsirois

Some gotchas to avoid in either the initial implementation or with follow-ups:

  1. Bifurcated resolves: https://github.com/python-poetry/poetry/issues/4381
  2. Environment marker explosion: https://github.com/pdm-project/pdm/issues/449

jsirois avatar Aug 16 '21 20:08 jsirois

There is one known impossible to handle case: when a #2-style ("platform agnostic") resolve needs to traverse an sdist. The sdist may require being (partially) built to extract python version and dependency metadata (e.g.: executing python setup.py egg_info). If the python needed by the setup.py does not exist on the machine generating the lock the lock must fail. This should be a rare problem since it seems ~all modern sdists already have PKG-INFO in them which contains that metadata and can simply be read.

jsirois avatar Aug 16 '21 20:08 jsirois

This work seems separable into the following task graph:

  1. Platform dependent locks with requirements.txt compatible output: #1401
  2. a. Platform agnostic locks with requirements.txt compatible output: #1402 | b. PEP-665 compatible output: #1403
  3. a. Bifurcated resolve handling: #1404 | b. Environment marker explosion handling: #1405

Of these tasks, Pants only needs 1, 2a and 3a *.

* Afaict Pants doesn't actually need all of 3a, it needs a subset of 3a where the bifurcation is in the top-level requirements themselves, not in interior nodes. That said, handling all of 3a is needed for correctness in all locks so it should probably be implemented fully.

jsirois avatar Aug 17 '21 17:08 jsirois

I think with the addition of --project in #2455 released in https://github.com/pex-tool/pex/releases/tag/v2.8.0, this issue and the associated project can be closed.

jsirois avatar Jul 14 '24 17:07 jsirois