pex
pex copied to clipboard
Pex CLI doesn't offer as much control over pex building process as the PEXBuilder
As a major PEX user twitter leverages the PEXBuilder class to build pexes instead of shelling out to the PEX cli for a few reasons.
- The
--sources-directory
option to add sources to a pex doesn't offer fine enough control over the final import paths of source files. In order to make--sources-directory
work we would have to prepare all the sources in merged directories. E.g Often multiple directories need to be merged into the same Chroot directory. - We make use of PEXBuilder.add_dist_location to add pre-installed wheels (unzipped wheels) to pexes. We have found that the overhead of using direct file URLs via
--requirement
on the CLI and even the mainPEXBuilder.add_distribution
methods to add significant overhead by invoking pip.pex for each distribution. We can improve build times significantly by relying on the existence of a mechanism for fetching 3rdparties that is outside of pex itself. - ... will update if I think of more ...
These may not be things that the Pex CLI wants to support but figured I would list them out based on the fact that PEXBuilder isn't under a stable API contract with users.
@hrfuller can you provide more detail on item 1? What would make sense? Right now you can only use multiple --sources-directory
to point at multiple sys.path
entries and everything under each pointed to is merged. Do you need to further exclude some files or subdirectories from --sources-directory
entries or would a simpler explicit file list work?
On 2 can you provide more color? Pex already never installs a given wheel more than once for a given PEX_ROOT. If the wheel has already been installed there (in installed_wheels/<hash>/the.whl/
) it will be re-used. Is it the case you have a cache of installed wheels that is resident but not a PEX_ROOT that is resident? Do you use pex3 lock create
lock files? Those also cut out Pip overheads such that, given a --lock
Pex just downloads wheels mentioned in the lock (if needed) and installs them (if needed) but never invokes Pip's resolution code after lock creation.
Basically I need a lot more detail about the environmental constraints you use Pex under I think as well as the perf improvements you get in % terms using your current techniques over some example PEX CLI baseline. With that info in hand, I can come up with real answers or ideas or infeasibles.
@hrfuller you were not able to provide more details and Pex has since grown even more ways to pick sources, now including:
-
-D
/--sources-directory
: Include everything under the given directory. -
-P
/--package
: Add a package and all its sub-packages with@
support for locating where the package lives relative to the current directory; e.g.: -P foo.bar@src to select thefoo.bar
package undersrc/
. -
-M
/--module
: Like-P
, but to select an individual module, also supporting the@
syntax. -
--project
: The directory of a project to add. Uses thepyproject.toml
/setup.py
/setup.cfg
in that directory to collect sources (and dependencies).
I'm going to assume the details won't be forthcoming and close, but please re-open if you have more specifics about cases modern Pex does not handle well via the CLI that you think it should.
I totally missed item 2. Although you can now pex --no-pre-install-wheels
to get wheels stored in a PEX as is (no extra compression, no pre-installation in a chroot), you still must run a Pip resolve. I find a ~40% speed increase with a quick hack building a torch PEX:
# Existing perf for pre-resolved wheel house:
:; time python -mpex --pip-version latest --no-pre-install-wheels /tmp/wheels/*.whl --intransitive -otorch.pex
/home/jsirois/dev/pex-tool/pex/pex/pex_builder.py:105: PEXWarning: The PEX zip at torch.pex~ is not a valid zipapp: Could not find the `__main__` module.
This is likely due to the zip requiring ZIP64 extensions due to size or the
number of file entries or both. You can work around this limitation in Python's
`zipimport` module by re-building the PEX with `--layout packed` or
`--layout loose`.
pex_warnings.warn(message)
real 0m10.857s
user 0m6.478s
sys 0m3.873s
# A quick hack that allows adding all wheels in `PEX_WHEELS_DIR` to the PEXBuilder directly:
:; time PEX_WHEELS_DIR=/tmp/wheels python -mpex --pip-version latest --no-pre-install-wheels -otorch.pex
/home/jsirois/dev/pex-tool/pex/pex/pex_builder.py:105: PEXWarning: The PEX zip at torch.pex~ is not a valid zipapp: Could not find the `__main__` module.
This is likely due to the zip requiring ZIP64 extensions due to size or the
number of file entries or both. You can work around this limitation in Python's
`zipimport` module by re-building the PEX with `--layout packed` or
`--layout loose`.
pex_warnings.warn(message)
real 0m6.376s
user 0m3.977s
sys 0m2.451s
On the precedent front there is --requirement-pex
:
--requirements-pex FILE
Add requirements from the given .pex file. This option
can be used multiple times. (default: [])
So I'll re-open to fix item 2 in the OP.
Thanks John! I haven't been involved with pex since some job changes but if you still feel like 2 would be a benefit to the project that's great. There has been talk of a pex_binary target in bazel rules_python for sometime. Based on my experience writing bazel rules for pex these changes would have given more control over the final pex that is created. Cheers.
As for details in number 1 I'm a bit hazy now but it had to do with merging directories that represented the same "source root" a pants concept into the pexes being created. This was using a much older version of pex v2 so that may have changed already
Hey Henry! Yeah, I think 2 makes sense as a feature on its own. As to 1, I'm still hazy too. you could definitely merge source roots just using -D source/root/1 -D source/root/2
even back when you filed this issue. So, at that point in time you must've wanted to be able to mask off portions of those source roots to not grab all of them, just some. AFAICT, you can achieve that now with -P package1@source/root/1 -P package2@source/root/1 -D package3@source/root/2
. It may be the case that subtracting is easier in some cases than adding like in this example, but I'll definitely wait for specific pain reports to surface before delving into that.