[Bug]: sys.path weirdness: local files/dirs shadow third-party packages
What happened?
I'm migrating a large codebase from rules_python to rules_py. One problem that I've had to address multiple times is that directory names and filenames are now shadowing third-party packages.
- I had to rename the
langchaindirectory not to conflict with https://pypi.org/project/langchain/.import langchain.llmsfound ourlangchaindirectory first insys.path. - Similarly, for a
pysparkdirectory. - The built-in
copymodule ended up importing from ourorg.py:
When using rules_py, a number of tests fail with a callstack like this:
Traceback (most recent call last):
File "/mnt/tmpfs/BAZEL_OUTPUT/sandbox/processwrapper-sandbox/33663/execroot/blah_blah/bazel-out/k8-fastbuild/bin/src/python/pytask/foo/bar/quux.runfiles/blah_blah/tools/build_rules/py/private/pytest_loader.py", line 6, in <module>
import pytest
...
File "/mnt/tmpfs/BAZEL_OUTPUT/sandbox/processwrapper-sandbox/33663/execroot/blah_blah/bazel-out/k8-fastbuild/bin/src/python/pytask/foo/bar/quux.runfiles/py_deps_main/attrs/attrs/attr/_make.py", line 5, in <module>
import copy
File "<frozen zipimport>", line 259, in load_module
File "/mnt/tmpfs/BAZEL_OUTPUT/execroot/blah_blah/external/python_interpreter/lib/python38.zip/copy.py", line 60, in <module>
File "//mnt/tmpfs/BAZEL_OUTPUT/sandbox/processwrapper-sandbox/33663/execroot/blah_blah/bazel-out/k8-fastbuild/bin/src/python/pytask/foo/bar/quux.runfiles/blah_blah/tasks/tasks/org.py", line 3, in <module>
If you look at the copy.py source, it's actually trying to do from org.python.core import PyStringMap. This is Jython support; a handled ImportError is raised in CPython. I worked around this by renaming org.py.
sys.path is substantially longer in rules_py. There are 1193 entries for the above test, versus 484 for rules_python. As far as I can tell, the primary difference is that every directory in our source tree that we import from seems to be in sys.path when using rules_py.
Version
Development (host) and target OS/architectures:
Linux x86_64, Ubuntu 20.02
Output of bazel --version:
bazel 6.2.0
Version of the Aspect rules, or other relevant rules from your
WORKSPACE or MODULE.bazel file:
0.3.0
Language(s) and/or frameworks involved:
Python, Scala, Java, ...
How to reproduce
No response
Any other information?
No response
Thanks for the feedback @georgevreilly ! We've finally found some funding to get started on a 1.0 release for rules_py and hope to reach that around BazelCon. Is your employer interested in sponsoring the work? (If not, excellent contributions like issue reports and verifying the fixes are much appreciated, of course)
I think this is a consequence of a fundamental design decision. rules_python puts the name of your repository on the sys.path and therefore your local top-level folders were namespaced by that, and didn't collide with third-parties. However, that was a Bazel-only convention, not how idiomatic Python is written. (it also causes problems with bzlmod where "the name of your repository" depends on the context where the name is accessed, see https://bazelbuild.slack.com/archives/C014RARENH0/p1695268845206649)
Most clients I've seen have a repeated folder structure for their top-level, so something like my-repo/my-repo/path/to/file so that import my-repo.path.to works correctly under Python outside of Bazel. I think you'll have to do a one-time refactor like this.
If you decide to do this, note that one option is to use git filter-repo to "relocate" your commits under that folder (see https://blog.aspect.dev/otherrepo-to-monorepo) so that you don't pollute the blame layer with a giant file move.