rules_python
rules_python copied to clipboard
Make `chmod` hermetic
Removes the need for system binaries to make the Python interpreter library directories/files read-only.
Fixes #2016
@rickeylev, what do you think about this? How do we move with this forward? Should we just start ignoring the pyc files in the repo and not bother with making it read-only?. The making of the external repos read-only is something that we don't do in the whl_library (maybe we should) and I am wondering what is the overall strategy here...
I am thinking that having the chmod, id and touch be hermetic might be ignoring the overall picture of what we are trying to do here.
what is the overall strategy here
So, correctness wise, there are two things we're constrained by:
- The input files (both content and the list of them) have to be stable.
- The downloaded python binary may not be runnable by the host that downloaded it, such as in a cross-build with RBE situation.
(1) can be accomplished by any of:
- a. Making everything read only.
- b. Generating all the necessary files at repo install time.
- c. Excluding files that might be created at runtime
I think we should do (a) regardless -- its just defense in depth. For the case of Windows (or other failures) we should just be OK with leaving things writable. It's the best we can do, and users will just bypass it entirely with ignore_root_error=True anyways (they are given the choice between "guaranteed pedantic failure" and "probably works, but flaky"; they'll always choose the latter).
I think (b) is better than (c), but it's out of scope for this PR and we already have (c) in place.
(2) throws a wrench into what this PR is doing -- using the downloaded runtime itself to run commands to make things read only. So I'm not really keen on that overall. I don't mind having an alternative implementation of chmod for OP's case where coreutils aren't available. Ideally, we'd either have a repo-phase reference to a host-compatible Python, or a repo-phase reference to a host-compatible chmod tool.
So, more concretely, I ask for the following changes:
- Use the system chmod if available, otherwise use the downloaded-python's chmod. If neither is available, oh well; continue on.
- Remove the fail() code paths. Just chmod and trust that things are OK.
I've updated to use uutils/coreutils and removed the fail paths.
I'm keen to work through getting this landed. I can rework the branch to current main.
Could I get some hints on how to implement this so that it has the best chance to get accepted, please?
I think there are multiple options here:
option 1 - use this chmod
In this case I think it would be best to create a separate repository rule maybe something called host_coreutils and then it will be very clear that we are trying to download the host-specific coreutils version.
We register this somewhere in internal_deps.bzl and then use it from the hermetic toolchain. It should be "used" internally as a dep in MODULE.bazel
option 2 - trust our glob and remove chmod
Remove the chmoding. This would solve the following:
- Delete the bazel cache requires a sudo, because the dirs are read-only
- Difference in what we do on Windows vs Linux today
- Need to chmod would be no-longer there
Drawback:
- Theoretical polution with pyc files if we don't use our python helpers to execute Python scripts in the repository context. Should we document that users should use
-B?
option 3 - do nothing
Requires on system chmod, which is undesirable as you have suggested by creating the PR.
proposal
I would actually propose option 2, @rickeylev, @mattyclarkson, what do you think?
cc: @arrdem
Option 2 is fine with me. @rickeylev?
Option 2 SGTM
Option 1 seems bad, option 3 I don't follow the downsides of. I'm leaning 2 by default but I don't feel like I have an informed opinion here.
So I'd propose us to go with the option 2 moving forward.