rules_python
rules_python copied to clipboard
sys.path[0] breaks out of runfile tree.
🐞 bug report
Affected Rule
py_binary, py_test.
Is this a regression?
No
Description
When Python initializes it adds the directory of the script to the sys.path:
As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first. Notice that the script directory is inserted before the entries inserted as a result of PYTHONPATH.
This would imply that a py target such as {REPO_DIR}/bazel-bin/{target_path}/{target}.runfiles/{script_path}/{python_script}.py would cause {REPO_DIR}/bazel-bin/{target_path}/{target}.runfiles/{script_path}/ to be inserted as sys.path[0].
However, because the script is a symlink to the real python script in the repository, Python follows the symlink and appends the actual underlying directory to sys.path[0], in this case: {REPO_DIR}/{script_path}/. This issue was raised in Python's bug tracker and noted as expected behaviour that won't be fixed issue17639.
This allows the script to "break out" of the runfiles tree and import code from the same directory which it does not necessarily have access to via src or deps.
This causes issues when creating multiple targets in the same directory which should not automatically depend on each other.
For example, given:
repo/
src/
main.py
foo.py
main can depend on foo without the file being explicitly listed in the BUILD file.
🔬 Minimal Reproduction
src/BUILD
package(default_visibility = ["//visibility:public"])
py_binary(
name = "main",
srcs = ["main.py"],
python_version = "PY3",
srcs_version = "PY3",
)
src/main.py
import sys
import foo
print(sys.path[0])
src/foo.py
print("foo imported")
The output of bazel run //src:main will be:
{REPO}/src/
foo imported
Which should not be possible where foo.py is not an input to //src:main.
🌍 Your Environment
Operating System:
macOS 11.0.1 (20B29)
Output of bazel version:
Build label: 3.7.0-homebrew
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Sat Nov 14 13:29:41 2020 (1605360581)
Build timestamp: 1605360581
Build timestamp as int: 1605360581
Rules_python version:
0.1.0
Anything else relevant? The most obvious solution that comes to mind for this would be to copy into the runfile tree rather than symlinking. It seems unlikely this is going to change upstream in Python.
Thanks for the report. This same problem is currently documented here https://github.com/bazelbuild/bazel/issues/7091 but I cannot find a counterpart already in this project.
The Python stub template has an example of a 'hack' to fix the problem https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt#L19. But that only fixes the problem in that python process, and the stub template starts the user's process and the problem is reintroduced like you observe.
It seems like the stub template could use the -s option to the Python interpreter:
https://docs.python.org/3/using/cmdline.html#cmdoption-s
when it is building the command to execute:
https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt#L349
@person142, neither -s or -S appears to stop the directory of the script being added to the path, rather, it prevents site wide packages being added to the path (perhaps this should be done anyway or provided as an option). For example, on my machine, it prevents /usr/local/lib/python3.7/site-packages' being added. This should probably be a separate issue ticket.
The problem will still exist because {REPO_DIR}/{script_path}/ is still added at sys.path[0]. It doesn't look like there is an interpreter option to disable this behavior.
There are some potential solutions noted in the bazel issue:
- Copy the runfiles instead of symlink.
- Invoke the script with
-cor-mwhich does not setsys.path[0]. - Potentially reuse the interpreter of the wrapper rather than creating a new one (which already deletes
sys.path[0]).
I am unsure which would be the most appropriate in this case. I think 1 might not be trivial because IIUC the runfiles behaviour is part of core Bazel and not rules_python specific.
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
This should probably be kept open since it's still a concern in the upstream issue
In the latest python 3.11 beta there's a -P flag and PYTHONSAFEPATH environment variable that prevent the python interpreter from prepending this path. So this could very easily be fixed by added PYTHONSAFEPATH to new_env in python_stub_template.txt. It will only fix it for python 3.11+ though, and it's only expected to release in October.