PyExecutableInfo.runfiles_without_exe includes full interpreter exe with standard library
🐞 bug report
Affected Rule
The issue is caused by the rule: py_binary
Is this a regression?
This has probably been broken since it was introduced.
Description
I'd like to build container images where the interpreter + standard library is part of a base image and the "app" layer contains just my python code (and explicit data dependencies).
I tried to write a small rule that extracts the necessary info from the provider to avoid pulling in the Python interpreter itself.
Looking at the output, the runfiles include the full interpreter, which is unexpected.
🔬 Minimal Reproduction
I have a working branch on tweag/rules_img: https://github.com/tweag/rules_img/tree/python_example/e2e/python
In the subdirectory e2e/python, I have a clean example of what I mean with a separate MODULE.bazel and a clean reproducer.
e2e/python/BUILD.bazel contains a py_binary and my custom rule for creating a "light" binary (using PyExecutableInfo).
e2e/python/extract_python_files/defs.bzl contains the rule I wrote to extract only the data I want.
Here are the repro steps:
bazel build :appfind bazel-bin/app.runfiles/
🔥 Exception or Error
You can see may files that shouldn't be there, including bazel-bin/app.runfiles/rules_python++python+python_3_13_x86_64-unknown-linux-gnu/bin/python3.13
🌍 Your Environment
Operating System:
NixOS 24.11 on amd64
Output of bazel version:
Bazelisk version: development
Build label: 8.3.1
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Mon Jun 30 16:23:40 2025 (1751300620)
Build timestamp: 1751300620
Build timestamp as int: 1751300620
Rules_python version:
bazel_dep(name = "rules_python", version = "1.5.3")
Anything else relevant?
I tried to understand why this happens.
Runfiles containing the toolchain are collected in _get_runtime_details as runtime_files:
https://github.com/bazel-contrib/rules_python/blob/cda58775c6fb1bfba93b3bbc55e8ce003a56960b/python/private/py_executable.bzl#L1247-L1258
The toolchain (including bin/python and the standard library) are passed into _get_base_runfiles_for_binary as extra_common_runfiles.
https://github.com/bazel-contrib/rules_python/blob/cda58775c6fb1bfba93b3bbc55e8ce003a56960b/python/private/py_executable.bzl#L1136-L1141
Those get added to common_runfiles: https://github.com/bazel-contrib/rules_python/blob/cda58775c6fb1bfba93b3bbc55e8ce003a56960b/python/private/py_executable.bzl#L1404
.. which are finally used as runfiles_without_exe: https://github.com/bazel-contrib/rules_python/blob/cda58775c6fb1bfba93b3bbc55e8ce003a56960b/python/private/py_executable.bzl#L1436
All of this makes me think that my assumptions about the intended use of PyExecutableInfo.runfiles_without_exe might be incorrect, and this is actually expected behavior. If this is the case, I'd like to turn this into a feature request instead:
For the purpose of creating container image layers (and other types of bundling), could we provide a field in some provider that contains the files of the application without the toolchain and standard library? (Bonus points if we can separate between third-party deps and normal application code). This would make it much easier for the larger ecosystem to bundle Python apps efficiently.
In general, I think it is a valid request, if users are using non-hermetic toolchains but want to build docker images, there still should be a way. Which toolchain are you using?
This is the Python toolchain setup in MODULE.bazel:
python = use_extension("@rules_python//python/extensions:python.bzl", "python")
python.toolchain(
python_version = "3.13",
)
use_repo(python, "python_3_13")
Ah, so you are using the hermetic toolchain, but you want to replace it during packaging? Are you concerned that the thing you are packaging is not strictly the same thing as the one you are testing?
How do you imagine the Python bootstrap to work? Do you use bootstrap_impl=script?
In theory, you could still use tar.bzl even today, where you have the mtree manifest that you can modify and just drop the runtime files and replace python with a symlink to the filesystem Python.
Ah, so you are using the hermetic toolchain, but you want to replace it during packaging?
That's correct. I want to use the same Python version in Bazel and in the container, but for efficiency, I want to put the toolchain in a separate container image layer or have it preinstalled in the base image (it doesn't change often and can be shared by multiple Python apps). I didn't think about the setting for bootstrap_impl yet.
The container image does have a "system Python" in the sense that the base image I'm layering upon contains a python interpreter.
Custom filtering with mtree manifests and tar.bzl does sound like an option. I'm currently building a more optimized way to build layers in rules_img that doesn't have any method for filtering runfiles so far, so that wouldn't work (yet).
I'm also considering just writing my own Aspect that collects srcs and data from py_library and friends, but I'm sure it's not easy to do this correctly.
FYI, The tar.bzl and rules_oci example shows how to put the interpreter in a separate layer.
The origin of runfiles_without_exe is from Google, where the entry point executable and the runtime are one in the same. The goal being, the original executable won't work as-is in the intended deployment environment, so a new one (with a different bootstrap, but same runtime) must be derived.
It's a similar intent as what you have, though.
It sounds like what you want is something like runfiles_without_runtime. You don't want the runtime, but probably want the same bootstrap.
That sounds mostly reasonable. Hm. However. The big catch is the bootstrap and runtime are fairly closely coupled when an in-build (eg hermetic) runtime is used -- the bootstrap assumes the interpreter is in the runfiles, but in your case, you want it to come from outside. Hm, now it's sounding contradictory. (Such isn't the case for a platform (non in-build) runtime, because it assumes the system provides it somehow).
What might work better is to switch the tool chain when building for your deployment environment. Switch it to a runtime_env tool chain, or equivalent, which assumes it has to get python from the environment. Though, setting that up is a bit of a pain. A transition or config change is also annoying, perf wise. Hm.
I'd be ok with adding a runfiles_without_runtime field, or changing runfiles_without_exe to also exclude the python runtime, or having separate fields (base binary runfiles, exe, py runtime). You'll probably end up needing to generate your own bootstrap, though, which is tedious.
@malt3 this sounds related to https://github.com/bazel-contrib/rules_python/issues/3324 WDYT? I can share the heuristic I've settled on for removing the "extra stuff".