pex
pex copied to clipboard
Feature Idea: Support generating a single merged pex from a PEX_PATH
IE, support something like:
- When creating a pex:
pex --pex-path=a.pex:b.pex --merged-output-file=merged.pex - Using an existing pex that contains a PexInfo.pex_path:
PEX_MERGED_OUTPUT_FILE=mymergedpex ./mypex
Here --merged-output-file is used as opposed to -o or --output-file and PEX_MERGED_OUTPUT_FILE is the env variable form.
Would probably be a good idea to note in the help for -o that it does not load the PEX_PATH if/when this is added.
How exactly to word all this is a bit fraught for sure. Current relevant help:
pex -h
...
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-o PEX_NAME, --output-file=PEX_NAME
The name of the generated .pex file: Omiting this will
run PEX immediately and not save it to a file.
...
Resolver options:
Tailor how to find, resolve and translate the packages that get put
into the PEX environment.
--pypi, --no-pypi, --no-index
Whether to use pypi to resolve dependencies; Default:
use pypi
--pex-path=PEX_PATH
A colon separated list of other pex files to merge
into the runtime environment.
...
A complicating phrase is PEX environment here and a complicating concept is that --pex-path is under the Resolver options heading - which in itself is perhaps fine, but confusing when combined with the knowledge that non-pex requirements are resolved into the built pex file - unlike PEX_PATH requirements which are only ever adjoined to the runtime sys.path.
what's the use case? just incremental binary goal build perf for pants?
it's worth noting that PEX_PATH remains a "best effort" thing given the proclivity for conflicts between various pex files 3rdparty requirements. IIRC, we effectively just bootstrap each pex in order and layer on the sys.path mutations. this seems fine for controlled runtime cases, but for sealed binary builds maybe less so?
ultimately, it's not clear to me how to best combine 1..N pex files into a net new pex without running another top level resolve against the new transitive closure of deps - which I thought was most of the runtime cost for the incremental build case.
what's the use case? just incremental
binarygoal build perf for pants?
Yeah, I was looking into pex resolve issues for an internal pants-plugin I was playing with (which no longer exists) and got sidetracked and learned some more about how pants does python.
ultimately, it's not clear to me how to best combine 1..N pex files into a net new pex without running another top level resolve against the new transitive closure of deps - which I thought was most of the runtime cost for the incremental build case.
I actually wasn't quite clear on the mechanism being described here -- I think I can totally see how just extending the PEX_PATH could make that resolution process much longer (especially if it's already most of the runtime cost of incremental build, that's a fun fact).
it's worth noting that PEX_PATH remains a "best effort" thing given the proclivity for conflicts between various pex files 3rdparty requirements. IIRC, we effectively just bootstrap each pex in order and layer on the sys.path mutations. this seems fine for controlled runtime cases, but for sealed binary builds maybe less so?
I was originally thinking of "just" doing some of that "resolution process" at pex build time if possible -- but if the "resolution process" is just layering on sys.path mutations (I hadn't dived into the code yet and it's unclear what I assumed was happening instead), I definitely can't immediately see how to do that runtime resolve process at build time like I was thinking originally.
pantsbuild/pants#9516 and related work may allow us to close this issue soon in favor of allowing pants to do the job of merging PEX files, instead of baking that into PEX itself. Super exciting!
I reject this feature idea for all the reasons mentioned by @kwlzn. In essence, PEX_PATH is an incredibly sharp-edged hack. It doesn't need its edge honed further.