pex icon indicating copy to clipboard operation
pex copied to clipboard

Avoid Resolving Dependencies When Building in Intransitive Mode

Open adeandrade opened this issue 4 years ago • 6 comments

I am building huge PEX fixes with more than 100 dependencies. The process takes around 30 minutes. It seems like most of the time is spent resolving dependencies.

Even when building with --intransitive and providing all necessary dependencies with --requirements, a lot of the time seems to be spent resolving these.

How can we improve this situation? With a few pointers I can contribute to this.

adeandrade avatar Nov 02 '20 01:11 adeandrade

This is an example of what I am running:

service-bin: project-image
	@./bin/dockerized \
		mkdir -p dist \&\& \
		poetry export --format requirements.txt --without-hashes --output dist/requirements.txt --extras service \&\& \
		poetry build --format wheel \&\& \
		pex \
			-v \
			--intransitive \
			--no-pypi \
			--index "https://${PYPI_USER}:${PYPI_TOKEN}@${PYPI_HOST}" \
			--requirement dist/requirements.txt \
			--entry-point ${PACKAGE_NAME}.service.main \
			--output-file dist/server.pex \
			--pex-root /root/.cache/pex \
			"dist/${PACKAGE_NAME_VERSIONED}.whl"

The exported requirements.txt includes transitive dependencies with fully defined versions.

adeandrade avatar Nov 02 '20 01:11 adeandrade

Thanks for filing this @adeandrade.

To clarify the terminology, typically we talk of two different files, requirements.txt, which contains the requirements your code directly depends on, and then the optional constraints.txt (sometimes referred to as a lockfile) which contains not just those requirements but also all their transitive requirements.

Requirements may be loose (e.g., foo>=2.5.1), but constraints must be pinned (e.g., foo==2.5.1).

Pip uses the constraints file to pick versions of dependencies as it resolves the requirements (so if you specify constraints you must also specify the underlying direct requirements).

Requirement files are passed to pex using --requirement and constraint files are passed using --constraints. These correspond to the pip --requirement and --constraint flags (the extra 's' at the end of pex's --constraints flag is an unfortunate slip-up).

It sounds like your requirements.txt is actually a constraints file? Is it comprehensive, and are all requirements in it pinned to a single version? Could you post it here if it's not secret? Or a redacted version at least?

benjyw avatar Nov 02 '20 04:11 benjyw

then the optional constraints.txt (sometimes referred to as a lockfile) which contains not just those requirements but also all their transitive requirements.

Nuance: they can contain all the transitive requirements, but need not. A constraints.txt file could technically have only one entry, for example, which means everything else is unconstrainted.

This nuance is important. It impacts whether --intransitive will work or not.

but constraints must be pinned

This isn't true. Constraints can be any value normally in requirements.txt, e.g. >=3.5. All the constraints file does is substitute the requirement string normally used with the constraint value.

Eric-Arellano avatar Nov 02 '20 05:11 Eric-Arellano

Thank you @benjyw and @Eric-Arellano for the responses. I guess my requirements.txt file is also a constraints file since it is derived from a lock file. All dependencies are expressed with equalities.

By specifying --constraints (as well as --requirement, since I still have the --intransitive flag) building time was reduced by 25%. That's good news.

I still see some resolving going on in the logs though. It still takes more than 10 mins on my project (when dependencies are cached). Can we do better?

adeandrade avatar Nov 02 '20 14:11 adeandrade

I guess it depends what exactly is happening in the time attributed to resolving. For example, downloading the dists can take time, and if they are sdists then pip has to run setup() on them, and that can take a while - for example, in some cases a lot of native code compilation has to happen. These results can be cached by pip, but it's possible that cache isn't being preserved across runs. E.g., CI machines typically present you with a clean container on every run, and you have to configure specific directories to be saved and restored between runs.

Can you post some snippets of those logs? And are you seeing this phenomenon on developer laptops, or CI machines, or both?

benjyw avatar Nov 02 '20 22:11 benjyw

This issue is related to #1086, which discusses the differences between a constraints file and a lockfile as well. Consuming an unchanged lockfile implementation would allow for a zero-resolve "fetch and validate the fingerprint of these precise wheels" step to reproduce the output of the resolve without running it.

stuhood avatar Nov 07 '20 00:11 stuhood

The focus here has been on resolving, but I think the ~OP shows that's a detour around the block: https://github.com/pex-tool/pex/issues/1093#issuecomment-720192890

IIUC @adeandrade just wants to make a PEX from their already working Poetry setup: roughly - turn a Poetry venv into a PEX. If I'm right there, that totally sidesteps any normal concept of resolving, Pex just needs to be able to slurp up a venv into a PEX file. That idea is tracked by #1361. It's been far too long, but if you are able to chime in on this assessment @adeandrade, I'd be grateful.

jsirois avatar Sep 28 '24 18:09 jsirois