rfcs Environment variable sandboxing

Rendered

First draft proposal to add mechanism to precisely control environment available to the env!/option_env! macros.

Implementation at https://github.com/jsgf/rust/tree/env-sandbox

Oct 26 '19 23:10 jsgf

AFAIK FOO=BAR rustc ... in most shells will set the (real) environment variable FOO for just that invocation of rustc. So what problem does --env-set solve? Is there a reason you'd ever want to set an environment variable in the "logical" environment but not the "real" environment of a single rustc invocation?
The other use cases seem to want --env-whitelist. What is would --env-blacklist be used for?
Allowing arbitrary combinations of all three flags and making the combined semantics of conflicting combinations depend on the order they're passed feels very strange to me. Why would anyone pass something like --env-set FOO=BAR --env-blacklist FOO in the first place? This almost feels like it's designed for some automated tool to generate these cli args; is that a use case we have?

Oct 27 '19 00:10 Ixrec

@lxrec:

AFAIK FOO=BAR rustc ... in most shells will set the (real) environment variable FOO for just that invocation of rustc. So what problem does --env-set solve?

A shell-based mechanism only applies if you're invoking rustc from a shell. If its some other build tool doing the invocation, then it may not have a mechanism to control the environment in that way.

Is there a reason you'd ever want to set an environment variable in the "logical" environment but not the "real" environment of a single rustc invocation?

There seems to be a few use-cases of using things like env!("PATH") to capture a fallback path of last resort (if any runtime mechanism fails). In this case you'd want to specify a path independently of the build environment, especially if you're cross-compiling, or otherwise building in an environment different from the deployment environment.

Or you may just want to block anything reading PATH entirely.

The other use cases seem to want --env-whitelist. What is would --env-blacklist be used for?

That's just for completeness because its hard to do negative matching in a regex. One could imagine that you might want to blacklist security sensitive variables, such as SSH_.*, without needing/wanting to manage everything.

Allowing arbitrary combinations of all three flags and making the combined semantics of conflicting combinations depend on the order they're passed feels very strange to me. Why would anyone pass something like --env-set FOO=BAR --env-blacklist FOO in the first place? This almost feels like it's designed for some automated tool to generate these cli args; is that a use case we have?

Yes, it is primarily intended for tooling to generate these options (esp since rustc is rarely invoked directly anyway). I suggest a Cargo-based idea in the RFC, though my primary use-case is a non-Cargo environment.

It doesn't make any real sense to set a variable then blacklist it, but it is important to define semantics for this case. I propose these because they're easy to describe and implement, but I'm open to other proposals if they make more sense.

Oct 27 '19 05:10 jsgf

rust-analyzer will have to do something along this lines, once we get to supporting env!. Unlike rustc, rust-analyzer has to work with many crates at the same time. If we just used std::env::var, we would make env vars the same for all crates, which clearly doesn't work for things like CARGO_OUTPUT_DIR. So, instead we plan to maintain a per-crate logical environment, with, possibly, a fallback to std::env::var.

Oct 27 '19 09:10 matklad

Yes I was thinking about the benefits of setting the env explicitly for persistent/embedded compiler instances.

Oct 27 '19 16:10 jsgf

On Thu, Oct 31, 2019 at 02:13:33PM -0700, Jeremy Fitzhardinge wrote:

jsgf commented on this pull request.

likely needs to be set so that the compiler can execute its various components, but there's no way to override this

so that env!("PATH") returns something else. This would be necessary where the compilation environment differs from the

deployment environment (such as when cross-compiling).

+This RFC proposes a way to precisely control the environment visible to the compile-time macros, while defaulting to the +current behaviour. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +Rust implements the env!() and option_env!() macros to access the process environment variables at compilation time. +rustc supports a number of command-line options to control the environment visible to the compiling code. + +By default all environment variables are available with their value taken from the environment. There are several +additional controls to control the logical environment accessed by env!()/option_env!(): +- only allow access to a specific whitelist of variables

Allow/deny is consistent with other rustc terminology.

Sounds good.

Oct 31 '19 21:10 joshtriplett

I've massively simplified this proposal. It removes all regexes from the command-line options, and has much simpler processing. Instead it proposes:

--env-clear - clear the logical environment
--env-remove VAR - remove a specific variable
--env-pass VAR - pass a variable from the process environment to the logical environment
--env-set VAR=VALUE - set a variable in the logical environment

These options are processed in the order given.

This allows for full control of the environment available within the crate being compiled.

https://github.com/jsgf/rust/tree/env-sandbox implements this as proposed.

Dec 05 '22 08:12 jsgf

ping @joshtriplett @matklad since you'd shown interest in this. At this point, I'm not sure there's enough here to need an RFC. Guidance would be appreciated.

Dec 05 '22 21:12 jsgf

Do you think it makes sense to combine --env-pass and --env-set into one argument (e.g., --env)? For example, Docker has docker run --env FOO[=BAR]. If there is no =, then it is interpreted as being passed through.

Dec 05 '22 22:12 jasonwhite

@jasonwhite The main advantage I see is that it's shorter to type, at the cost of being a bit less explicit. Since nobody directly types rustc command lines in practice, I don't think brevity is all that important. I don't feel too strongly about it either way though.

The path extension would presumably make this --env-path VAR[=VALUE].

Dec 05 '22 23:12 jsgf

POC implementation updated to current upstream master https://github.com/jsgf/rust/tree/env-sandbox

Jul 28 '23 02:07 jsgf

cc @JakobDegen

Jul 29 '23 20:07 jsgf

Updated to include:

explicitly discuss the distinction between operational and directive environment fetches
more discussion about proc-macros, esp with respect to the proposed tracked environment API

Sep 10 '23 02:09 jsgf

Thanks for the RFC @jsgf! We reviewed it during this week's compiler team steering meeting and had a few thoughts.

The RFC suggests creating a distinction between the actual env vars for the rustc process (the "process environment") and the ones configured via the cli (the "logical environment"). Our feeling is that mutating the process environment after arguments have been parsed would be better from an implementation perspective as only the very early startup code in rustc will need to be aware of these options and the rest of the compiler does not need to be changed. It also seems that this approach would likely have fewer bugs where various parts of the compiler forgot to use the logical environment instead of the process environment.

This option doesn't seem to be explored in the stated alternatives. Have you considered this approach?

Sep 29 '23 14:09 wesleywiser

@wesleywiser

Have you considered this approach?

I did, but I rejected it pretty early so I didn't even consider it as a plausible alternative. There's a few reasons why that would be far from my favoured approach:

Most importantly, from a semantic point of view, is that it doesn't allow setting the logical variable, as seen by env!() differently from the one needed by the rustc process itself, or its sub-processes. For example it would not be possible to suppress or change environment variables like PATH, HOME, or more domain specific ones containing (for example) API keys in the logical environment.

While these may not actually be required by rustc itself, one of the goals of this proposal is to remove the need to carefully audit and curate the environment before invoking rustc, and instead just explicitly set the logical environment known to be needed by the crate in question. Ideally, every invocation should be rustc --env-clear --env-set FOO=bar ... so that the logical environment is completely defined by the command line. If --env-clear were manipulating the process environment, then it would likely make the run non-viable since, for example, it wouldn't be able to invoke the linker via PATH. Requiring the invoker to add --env-set for every variable that's not only required by the crate but also required at build time by the toolchain/build system undermines the value of the feature almost entirely.

Concretely, Buck2 attempts to limit the number of "ambiently" set environment variables to a "very limited initial environment" which has since grown to 20 environment variables we've found to be required in our environment (and I have no reason to believe it will stop there). Many of these are ones we would like to prevent random code from including with env!(), if only from a reproducable build perspective. I'd like to eliminate this list entirely, and just make the environment variables explicitly defined in the build rule available in the logical environment and not have to worry about the rest affecting the generated .rlib files.

Secondly, setenv/env::set_var is just an awful API, and I wouldn't want to rely on it. The fact that it's unsound in a multi-threaded process means that it introduces a brittleness that could bite at any time in the future. While one would hope we could rely on "very early startup code in rustc" being single-threaded, in practice we can't, in general. For example, jemalloc will create its own threads very early, and I don't know how that would interact with setenv (it uses the environment for its own purposes, so without deeper analysis I'd assume it would). My understanding is that there's effectively no such thing as a single-threaded macOS process either.

I'm also not sure to what extent rustc is usable as a library which can be instantiated multiple times within a process. Even if its not possible today, relying on process wide global state would make it more difficult in future. And other tools which want to reuse the same command line syntax, like rust-analyzer, would not be able to if the command line is filled with toolchain invocation detail.

Furthermore, if we're planning on introducing the notion of a "tracked environment" anyway, then we need to do a careful audit of all the sites which read the environment and determine whether they should be tracked. I think "tracked environment" and the "logical environment" should be synonymous (though I haven't done deep analysis here). We very definitely don't want to be tracking environment variables which are implementation details of running the toolchain which don't have a material effect on the generated artifacts.

Sep 29 '23 16:09 jsgf

The --env-set rustc option was already implemented (tracking issue). Is there anything blocking this RFC from being merged?

Jan 13 '24 13:01 GuillaumeGomez

Clearing of extra variables is an improvement, but I'm worried that it doesn't help catch untracked uses of optional env variables. This could lead to a confusing situation where the program doesn't seem to "see" its configuration variables.

Could it instead fail hard on any attempt to read a variable that isn't explicitly allowed?

Mar 01 '24 21:03 kornelski

@kornelski

So you mean you want option_env to start failing for a variable X if:

you use --env-clear and don't name X in --env-pass or --env-set, or
if you use --env-remove X

?

That seems like a pretty large change in behaviour for option_env. I could imagine situations where you want to leave a variable unset knowing that option_env would return None. In fact I think it completely undercuts the rationale for option_env and the reason you'd use it, since:

The overarching point of this proposal is that the env macros read from a logical environment rather than the process environment
The default of populating the logical environment from the process environment is primarily as a backwards-compatibility measure; ideally it would always start empty, and populating it from the process environment (either entirely or variable by variable) is an explicit choice
So if one is using option_env you're anticipating that the variable won't be set in the logical environment in some cases, but I don't think the specific reason of why it happens not to be set in the logical environment is really relevant. If the code really cares whether its set, they would use env.

Mar 03 '24 05:03 jsgf

The RFC doesn't specifically say how Cargo would use it. What I'd like to see is Cargo crates listing all env vars they use, in some declarative machine-readable way. This would help outer build systems and containers track, configure, and cache the variables accordingly. It would also allow docs.rs to list possible configuration options.

So my suggestion was from that perspective of forcing crates to declare everything they use.

If a crate used an option_env! somewhere with a var name that wasn't listed in the config, I don't see other good options:

if it's allowed to read Some value anyway, then it becomes an untracked configuration option. That's the current problematic state, and the crate maintainer may be unaware of this untracked use.
if it's not allowed to read by default, but it always silently returns None, then it begs the question why the the source code has a defunct always-None code. This looks like a var that has been accidentally left out of the allow-list.
if it's not allowed by default, and it fails loudly, then it's clear it was a forgotten option that needed to be added to the env var allow list (or added to an explicit deny list to acknowledge its existence but keep it as None without removing code).

So what I'm proposing is breaking builds on purpose, and of course would need an opt in, and likely some cargo fix tool that gathers all uses of env vars for the initial allow list.

One problem I see with explicit allow list is names that are computed dynamically, e.g. cross-build vars containing the target triple. Or pkg-config crate looking up env var overrides for specific libraries it only gets at build/run time. That may need support for wildcard patterns.

Mar 04 '24 00:03 kornelski

Yeah, I'm sympathetic to what you're saying, but I was really hoping to avoid changing the semantics of option_env here. Maybe a followup which adds an option to implement the semantics you're proposing?

Mar 07 '24 19:03 jsgf

What remains to be done here?

Aug 10 '24 14:08 GuillaumeGomez