rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

fix: remove 'site-packages' from install dir prefix

Open mattem opened this issue 1 year ago • 7 comments

Removes the site-packages path segment in unpacked external dependency installs.

This additional path segment causes issues when dealing with very large dependency trees, so much so that the value of PYTHONPATH exceeds the OS limit of what can be set.

This manifests due to the Python binary launcher adding workspace roots automatically to sys path, in addition to the site-packages segmented path, ie the PYTHONPATH var will end up with both (from printing the PYTHONPATH in examples/pip_parse in rules_python.

Before:

/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_certifi/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_chardet/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_idna/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_urllib3/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_requests/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_alabaster/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_babel/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_docutils/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_imagesize/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_zipp/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_importlib_metadata/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_markupsafe/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_jinja2/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_packaging/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_pygments/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_snowballstemmer/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_jsmath/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinx/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_serializinghtml/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_qthelp/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_htmlhelp/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_devhelp/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_applehelp/site-packages
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/_main
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_alabaster
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_babel
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_certifi
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_chardet
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_docutils
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_idna
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_imagesize
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_importlib_metadata
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_jinja2
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_markupsafe
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_packaging
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_pygments
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_requests
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_snowballstemmer
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinx
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_applehelp
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_devhelp
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_htmlhelp
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_jsmath
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_qthelp
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_serializinghtml
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_urllib3
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_zipp
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~python~python_3_9_aarch64-apple-darwin

After:

/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_certifi
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_chardet
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_idna
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_urllib3
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_requests
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_alabaster
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_babel
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_docutils
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_imagesize
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_zipp
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_importlib_metadata
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_markupsafe
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_jinja2
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_packaging
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_pygments
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_snowballstemmer
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_jsmath
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinx
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_serializinghtml
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_qthelp
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_htmlhelp
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_devhelp
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~pip~pypi_39_sphinxcontrib_applehelp
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/_main
/private/var/tmp/_bazel_matt/3cf6080de036ce65f48725921520350e/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/main.runfiles/rules_python~override~python~python_3_9_aarch64-apple-darwin

In large graphs, this increase in env var length results in a catastrophic failure:

[REDACTED] line 374, in _RunExecv
    os.execv(python_program, [python_program, main_filename] + args)

OSError: [Errno 7] Argument list too long: /[REDACTED]/python3_9_x86_64-unknown-linux-gnu/bin/python3'

While this behaviour is possible to toggle via --experimental_python_import_all_repositories, there are a large number of dependencies that expect these roots to be available (runfiles for example iirc).

mattem avatar Feb 16 '24 18:02 mattem

I am not sure if changing this won't have any unintended consequences. I've linked a few tickets here:

  1. People may depend on site-packages being present in the path. See the PR and the rules_oci example for splitting packages by layer in https://github.com/aspect-build/bazel-examples/blob/main/oci_python_image/py_layer.bzl#L14.
  2. The entries in the PYTHONPATH having the top level folder of site-packages is much better than <whatever bzlmod layout currently is> as the later might have extra characters that don't play nice with the PYTHONPATH spec. Maybe the solution could be to use rules_py way of constructing a venv but it sounds like a very invasive chaneg.

Probably there are more things, but these are the only that come to my head right now.

aignas avatar Feb 29 '24 01:02 aignas

+1 to site-packages being a conventional stable path component, as in standard virtualenv and other interpreter deployments. I own machinery which relies on this and this would be a breaking change to me. I'd rather see work which either bypassed $PYTHONPATH for setup entirely (see the current draft of having a manifest and a bootstrap shim that manipulates sys.path directly) or which provided for "flattening" files from many PyPi site-package trees into a single site-package tree for all 3rdparty deps.

arrdem avatar Feb 29 '24 16:02 arrdem

re: backwards compatibility concerns: Yeah, these are valid concerns. I'm fine with putting this behind an environment variable to control behavior. This at least removes Matt having to carry a patch indefinitely until we can fix it and he can upgrade to Bazel 7. Then the envvar can just be a no-op and we can drop it.

Unfortunately, due to support for Bazel 6.x, our options for addressing the problem of the long PYTHONPATH env var are pretty limited because we can't change the binary generation or bootstrapping in that version of Bazel. With Bazel 7, its a much more solvable problem, as we can now just modify the rules and bootstrap directly (patches welcome). Unfortunately, Matt's situation needs to support Bazel 6.x

The only other idea we've been able to come up with is to hide the deps/PyInfo from the binary by having a generated site.py-based target capture it and perform path manipulations. I don't think anyone has tried to implement it yet, though.

I own machinery which relies on this

Can you explain why it relies on it? There aren't any promises about the unpacked layout of the downloaded packages. The targets are the unit of abstraction.

see the current draft ...

What draft is this referring to?

rickeylev avatar Mar 01 '24 19:03 rickeylev

until he can fix it

lol, oops, typo. I meant "we" as in the rules_python owners. Patches welcome, too, of course.

rickeylev avatar Mar 01 '24 23:03 rickeylev

Thanks for the context, I'm fine with this behaviour being gated under a feature flag.

aignas avatar Mar 01 '24 23:03 aignas

@mattem, should this PR be continued?

aignas avatar Apr 06 '24 03:04 aignas

Just adding a note that I think this will break anyone using the py_layers or py_oci_image macros from here.

jvolkman avatar Apr 30 '24 18:04 jvolkman

Feel free to reopen PR later. Closing to clean things up.

aignas avatar Jul 19 '24 07:07 aignas