pex icon indicating copy to clipboard operation
pex copied to clipboard

Pex CLI doesn't offer as much control over pex building process as the PEXBuilder

Open hrfuller opened this issue 2 years ago • 2 comments

As a major PEX user twitter leverages the PEXBuilder class to build pexes instead of shelling out to the PEX cli for a few reasons.

  1. The --sources-directory option to add sources to a pex doesn't offer fine enough control over the final import paths of source files. In order to make --sources-directory work we would have to prepare all the sources in merged directories. E.g Often multiple directories need to be merged into the same Chroot directory.
  2. We make use of PEXBuilder.add_dist_location to add pre-installed wheels (unzipped wheels) to pexes. We have found that the overhead of using direct file URLs via --requirement on the CLI and even the main PEXBuilder.add_distribution methods to add significant overhead by invoking pip.pex for each distribution. We can improve build times significantly by relying on the existence of a mechanism for fetching 3rdparties that is outside of pex itself.
  3. ... will update if I think of more ...

These may not be things that the Pex CLI wants to support but figured I would list them out based on the fact that PEXBuilder isn't under a stable API contract with users.

hrfuller avatar Sep 16 '22 19:09 hrfuller

@hrfuller can you provide more detail on item 1? What would make sense? Right now you can only use multiple --sources-directory to point at multiple sys.path entries and everything under each pointed to is merged. Do you need to further exclude some files or subdirectories from --sources-directory entries or would a simpler explicit file list work?

On 2 can you provide more color? Pex already never installs a given wheel more than once for a given PEX_ROOT. If the wheel has already been installed there (in installed_wheels/<hash>/the.whl/) it will be re-used. Is it the case you have a cache of installed wheels that is resident but not a PEX_ROOT that is resident? Do you use pex3 lock create lock files? Those also cut out Pip overheads such that, given a --lock Pex just downloads wheels mentioned in the lock (if needed) and installs them (if needed) but never invokes Pip's resolution code after lock creation.

jsirois avatar Sep 18 '22 02:09 jsirois

Basically I need a lot more detail about the environmental constraints you use Pex under I think as well as the perf improvements you get in % terms using your current techniques over some example PEX CLI baseline. With that info in hand, I can come up with real answers or ideas or infeasibles.

jsirois avatar Sep 18 '22 02:09 jsirois

@hrfuller you were not able to provide more details and Pex has since grown even more ways to pick sources, now including:

  • -D / --sources-directory: Include everything under the given directory.
  • -P / --package: Add a package and all its sub-packages with @ support for locating where the package lives relative to the current directory; e.g.: -P foo.bar@src to select the foo.bar package under src/.
  • -M / --module: Like -P, but to select an individual module, also supporting the @ syntax.
  • --project: The directory of a project to add. Uses the pyproject.toml / setup.py / setup.cfg in that directory to collect sources (and dependencies).

I'm going to assume the details won't be forthcoming and close, but please re-open if you have more specifics about cases modern Pex does not handle well via the CLI that you think it should.

jsirois avatar Aug 17 '24 01:08 jsirois

I totally missed item 2. Although you can now pex --no-pre-install-wheels to get wheels stored in a PEX as is (no extra compression, no pre-installation in a chroot), you still must run a Pip resolve. I find a ~40% speed increase with a quick hack building a torch PEX:

# Existing perf for pre-resolved wheel house:
:; time python -mpex --pip-version latest --no-pre-install-wheels /tmp/wheels/*.whl --intransitive -otorch.pex
/home/jsirois/dev/pex-tool/pex/pex/pex_builder.py:105: PEXWarning: The PEX zip at torch.pex~ is not a valid zipapp: Could not find the `__main__` module.
This is likely due to the zip requiring ZIP64 extensions due to size or the
number of file entries or both. You can work around this limitation in Python's
`zipimport` module by re-building the PEX with `--layout packed` or
`--layout loose`.
  pex_warnings.warn(message)

real    0m10.857s
user    0m6.478s
sys     0m3.873s

# A quick hack that allows adding all wheels in `PEX_WHEELS_DIR` to the PEXBuilder directly:
:; time PEX_WHEELS_DIR=/tmp/wheels python -mpex --pip-version latest --no-pre-install-wheels -otorch.pex
/home/jsirois/dev/pex-tool/pex/pex/pex_builder.py:105: PEXWarning: The PEX zip at torch.pex~ is not a valid zipapp: Could not find the `__main__` module.
This is likely due to the zip requiring ZIP64 extensions due to size or the
number of file entries or both. You can work around this limitation in Python's
`zipimport` module by re-building the PEX with `--layout packed` or
`--layout loose`.
  pex_warnings.warn(message)

real    0m6.376s
user    0m3.977s
sys     0m2.451s

On the precedent front there is --requirement-pex:

  --requirements-pex FILE
                        Add requirements from the given .pex file. This option
                        can be used multiple times. (default: [])

So I'll re-open to fix item 2 in the OP.

jsirois avatar Aug 17 '24 17:08 jsirois

Thanks John! I haven't been involved with pex since some job changes but if you still feel like 2 would be a benefit to the project that's great. There has been talk of a pex_binary target in bazel rules_python for sometime. Based on my experience writing bazel rules for pex these changes would have given more control over the final pex that is created. Cheers.

As for details in number 1 I'm a bit hazy now but it had to do with merging directories that represented the same "source root" a pants concept into the pexes being created. This was using a much older version of pex v2 so that may have changed already

hrfuller avatar Aug 17 '24 17:08 hrfuller

Hey Henry! Yeah, I think 2 makes sense as a feature on its own. As to 1, I'm still hazy too. you could definitely merge source roots just using -D source/root/1 -D source/root/2 even back when you filed this issue. So, at that point in time you must've wanted to be able to mask off portions of those source roots to not grab all of them, just some. AFAICT, you can achieve that now with -P package1@source/root/1 -P package2@source/root/1 -D package3@source/root/2. It may be the case that subtracting is easier in some cases than adding like in this example, but I'll definitely wait for specific pain reports to surface before delving into that.

jsirois avatar Aug 17 '24 19:08 jsirois