Add `--strict_repo_env` option
This PR introduces a new flag --strict_repo_env which stops repository rules and module extensions from inheriting the client environment (making --repo_env=NAME more than just an advisory notice).
See test_execute_environment_strict_vars in src/test/shell/bazel/starlark_repository_test.sh for a demonstration.
Note that the behavior is different to the similarly named --incompatible_strict_action_env, which stops all environment variables (--action_env affects actions with use_default_shell_env = True) except those specified within the defining rule. Given the entity targeted by the new flag is a repo rule, I think this makes sense and any further strictness (like blocking implicit usage of variables without establishing a dependency) would be better addressed independently. Same goes for regular rules which can have some counterintuitive behaviors (e.g. only --[host_]action_env=X=Y is accessible, --[host_]action_env=Y will be missing, at least last I checked).
Addresses #10996
TODO
- [ ] Implement env expansion (per repository) for
environand.getenvusage. - [ ] Investigate how
repoEnvis used in credential managers. Should they useclientEnvinstead? - [ ] As above, for download manager.
- [ ] As above, volatile input map (
SOURCE_DATE_EPOCH). - [ ] Investigate
SpawnActionandExtraActionusage ofclientEnv. - [ ] Investigate standalone test strategy usage of
clientEnv. - [ ] Investigate symlink tree strategy helper usage of
clientEnv. - [ ] Investigate
WriteAdbArgsAction, action usage ofclientEnv. - [ ] As above, for
CppCompileAction. - [ ] As above, for
JavaCompileAction. - [ ] Investigate the
runcommand, which intentionally emulates the test environment (at least partially). May not matter here, but worth checking. - [ ] Investigate usage of
clientEnvinMobileInstallCommandandmobile-install. - [ ] Investigate usage of
clientEnvinSkyframeExecutor. - [ ] Investigate
clientEnvusage in sandbox strategy. - [ ] As above, local spawn strategy.
- [ ] As above, worker spawn strategy.
The CI failure flushed out a potential bug. --repo_env=NAME results in reading values from the server process with System.getenv when clientEnv.get should have been used.
With this issue, any changes to NAME in the client environment won't propagate until the server restarts.
With this issue, any changes to
NAMEin the client environment won't propagate until the server restarts.
Could you split the fix into a separate PR? That would make it easier to cherry-pick into patch releases and Bazel 8.
Since --strict_repo_env will require users to allowlist env variables, I think that we should also ignore --action_env for repo rules when the flag is enabled. Otherwise users migrating to the flag may accidentally hurt their cache hit ratio and require another migration down the line.
Stale environment issue split to #24433 (this PR is dependent on the fix for tests to pass, so I'll leave it here as well and rebase once merged).
Since --strict_repo_env will require users to allowlist env variables, I think that we should also ignore --action_env for repo rules when the flag is enabled. Otherwise users migrating to the flag may accidentally hurt their cache hit ratio and require another migration down the line.
My initial thoughts were to keep this as a project preference, since currently many repository rules are heavily influenced by the host environment (sometimes intentionally). But if this change is being viewed as something to move towards, it may make sense to classify this as an incompatible change like with --incompatible_strict_action_env to signify a potential flip in a later release.
On a related note, there are a few places that use the "old" repo environment in src/main/java/com/google/devtools/build/lib/bazel/BazelRepositoryModule.java.
- The
vendorcommand (used in downloader). - The registry (bzlmod) factory (used in downloader).
And for sake of completion, the client environment is directly used in these places.
- The credential helpers. Used in many places, including remote execution/caching.
- The download manager (one of the instances at least...).
SpawnActionandExtraActioninstances (most likely used for--action_env=NAME). Usage unfortunately allows for some ambiguity.- The volatile input map (date format via
SOURCE_DATE_EPOCH). - The standalone test strategy (for
${inherited}which I'm not familiar with). - The symlink tree strategy helper (for symlink creation via a separate process).
WriteAdbArgsActionaction, for user home directory (HOME).CppCompileActionspawn.JavaCompileActionspawn.- The "run" command (what you'd expect and as part of configuring test environment emulation).
MobileInstallCommandsubprocess. Running themobile-installdeployer.- Passed into
SkyframeExecutor(which likely ultimately is the source for most other things listed here). - The sandboxing spawn strategies (via special env provider).
- The local spawn strategy (as above).
- The worker spawn strategy (as above).
The only spot of concern for me here is the download manager. Seems like in its current state, this PR may introduce some fragmentation (though such fragmentation would have already been possible with --repo_env, the new flag just makes it much more substantial).
Thanks for compiling this list, I'll have to think about this more.
Since some repo rules do depend on the environment quite heavily, have you considered making this behavior configurable on the repository_rule call instead?
Since some repo rules do depend on the environment quite heavily, have you considered making this behavior configurable on the
repository_rulecall instead?
No, but only because I didn't think to.
I'm not sure what such an API would look like. There is repository_rule(..., environ = [...]) (to mark variables that will trigger a refetch) however that is deprecated in favor of repository_ctx.getenv. The same cannot be said for module_extension(..., environ = [...]) strangely.
Some questions (not necessarily directed at anyone, just to help with ideation).
Should this be implemented as a binary switch that limits variables to only those explicitly listed by --repo_env? Should they be able to give a strict list of env inputs? What of the existing dynamic dependency mechanism (.getenv)? Should --strict_repo_env be a part of the design, as a way to change default environment variable inclusion behavior? Should there be any default inclusions (like PATH) and should rules be able to opt into a completely bare environment?
Looking even further beyond this, a case could even be made for sandboxing certain repository rules. The download repos created by npm_translate_lock (Rules JS) come to mind, although such repos could hypothetically be implemented with a special "download action" (to allow cache hits with few/no downloads) or more likely a slightly different design to be compatible with vendoring. This is all well outside beyond the scope here though.
Keenly awaiting this one, thank you for this work 👍
What I'm currently thinking here is;
PATHis always inherited.PATHEXTis always inherited on Windows (affectsPATHlookup).--repo_envcan override the latter and apply to all.environ(argument) andgetenvcan pull in additional env vars from the client env.environ(attribute) no longer works as a way to peak at the environment without making it a tracked dependency.
That represents a fairly significant expansion vs. the original scope, something that ought to go through the proposal process first.
That all said, I do have a need to fully lock-down the repo environment (an extreme that is not suited to all) and would rather not leave this in limbo. So here is my plan;
- Incompatible flag to disable
--[host_]action_env=NAME=VALUEside effect. - Flag to make repo env explicit (not layer over the client env).
- (eventually) A proposal for env management. Maybe repo focused, maybe more.
This looks great! I did have one query:
environ (argument) and getenv can pull in additional env vars from the client env.
With --strict_repo_env enabled, will environ(argument) allow repo rules to pull in additional env vars that have not been 'permitted' via --repo_env?
We are being impacted by this (we have tools that modify the current shell environment, and they are invalidating our builds), so I wondered if you would update on plans?
In the meantime, is there any workaround, other than wrapping all bazel calls in a wrapper that sanitizes the environment?
For those interested in this change, #26300 may also be of interest.
I would still love to see this land pre 9.0
I've rebased, refactored and updated the PR description. Once CI is happy I'll mark this as ready for review.
To make this happen I've decided to exclude behaviors like environ and getenv triggering inclusion of more variables from the client environment (comment), pretty terrible idea on reflection (difficult to implement, difficult to debug).
unless overridden or explicitly removed via --repo_env=-VARNAME
The syntax for that is --repo_env==VARNAME
Thanks, mixed the syntax up with another flag. Updated.
Do we want to get this in Bazel 9?
I'm pretty sure that question is directed at reviewers.
As for myself, I'm fine with leaving this to a 9.x release. The project I'd like to use this in needs to complete a migration off WORKSPACE before it can be upgraded to v9, so backporting to v8 would be necessary to benefit in any reasonable timeframe.
This would be really nice to get into 9.0.
@bazel-io fork 9.0.0
CI seems to be happy, @meteorcloudy can you please review?
Would it be possible to get this in 8.x?
We don't want to delay 8.5 release any further so will mark this one for later 8.x minor release.