Fix:Introduce an interface to expose the current `Command` captured env var logic

Open schneems opened this issue 1 year ago • 13 comments

Proposal

Context

The Command struct allows developers to execute system commands (such as bundle install through a Rust interface). It includes modifying (set) and reading (get): env, current working directory, args, and program name.

When you execute a command, it will resolve both "explicit" environment variables (set via env, envs, env_remove) and implicit "inherited" environment variables (modifiable by env_clear) via this logic https://github.com/rust-lang/rust/blob/3a5c8e91f094bb1cb1346651fe3512f0b603d826/library/std/src/sys_common/process.rs#L36-L51 and then pass the result to the operating system. I will refer to this code as environment variable "capture" logic.

There are two problems with this process, detailed in the next section.

When a Command runs, the environment variables it uses are affected by:

Explicit mappings set (via Command::envs and Command::env)
Implicit mappings, inherited via the parent process
Explicitly denied-listed single inherited mapping (via Command::remove_env)
Implicit deny-list all inherited mappings (via Command::env_clear)

To reliably determine the exact environment variables a child process used when it ran, we must have those four pieces of information. There is a public API to determine the explicit mappings Command::get_envs. This API also tells us what values were explicitly removed via Command::remove_env. However, it does not include information about inherited environment variables. From the docs:

When Command::env_clear executes, it sets an internal state of clear=true. With this flag, the child process is told to not inherit any environment variable mapping from the parent.

That means we've got four possible states an environment variable can be in:

A) env_clear NOT called, explicit mapping set
B) env_clear called, explicit mapping set
C) env_clear NOT called, explicit mapping NOT set
D) env_clear called, explicit mapping NOT set

Without knowing if env_clear was called, states A & B and B & D are indistinguishable. This lack of information prevents us from determining the exact environment variables a Command used when it ran.

Problem statement(s)

Problem 1: Cannot observe effects of Command::env_clear
Problem 2: The "capture" logic has no exposed single source of truth

Problem 1 is a capability problem (I can't work around this). Problem 2 is a usability and stability problem.

Problem 1: Cannot observe effects of `Command::env_clear`

As documented in my merged PR, calling env_clear will cause the "capture" logic to skip inheriting from the parent process. While a developer can set this value, they cannot programmatically observe if it has been set (though they can manually observe it via "alternative" debug formatting {:#?}).

The end goals of programmatically observing that effect is listed below in "Motivation".

This problem prevents me from reproducing the capture logic in a library where I cannot observe if env_clear has been called. Even if this problem is solved, a related problem is detailed below.

Problem 2: The "capture" logic has no exposed single source of truth

Even if we could directly observe the effects of calling env_clear, every developer who needed this information would need to reproduce this "capture" logic https://github.com/rust-lang/rust/blob/3a5c8e91f094bb1cb1346651fe3512f0b603d826/library/std/src/sys_common/process.rs#L36-L51 and hope that it does not change or that their implementation does not diverge.

If the outcome of this logic is exposed, then it will:

Solve problem 1
Prevent divergent reimplementation of "capture" logic

Out of scope

Env var TOCTOU

Like other system resources (such as files on disk), environment variables may be modified between the time of check and use. This TOCTOU problem is out of the scope of my motivation for opening this issue. If you have that problem, please open a separate issue.

Motivation, use-cases

I want to use a Command struct as the single point of truth for what will run in the future or what already ran, including environment variables.

Motivation: Context

I maintain the Ruby buildpack, written in Rust. The job of a buildpack is to take in a user's application, perform work, and then output an OCI image. A core part of this work is running system commands. Such as bundle install and rake assets:precompile. Other buildpacks target other ecosystems: nodejs, python, PHP, go, and Java.

As part of providing this experience, I need to:

To advertise what command is being run before it runs.
Give a user detailed information when a command unexpectedly fails after it runs.

To do that with Rust, I want to:

Programmatically displaying a command's environment variables before it executes for an end-user.
Programmatically adding system information after a failed command execution for an end-user (e.g., which_problem)

I cannot accomplish these while using & mut Command as a single point of truth today. See below for more details.

Motivation: Programmatically displaying a command's environment variables before it executes for an end-user.

Commands like rake assets:precompile depend on environment variables for configuration (like RAILS_ENV=production). It's important that I can display the command before it executes because:

If it freezes, users need to be able to try to reproduce the freeze
If it fails, users will try to reproduce the failure

To that end, I want my user to see an output in their build log like:

Running: $ RAILS_ENV=production bin/rake assets:precompile

While I can provide this type of output for my users today, I cannot generalize it into a library for other buildpacks while only consuming &mut Command without introducing another point of truth.

Here's an example function:

// This code is invalid as it cannot determine the actual env state.
//
// Either we can assume `env_clear` was called, and only explicit
// values should be used, or assume it wasn't called
// and then implement the env var inheritance logic manually and
// hope there's no mismatch between our logic and Rust's.
//
pub fn command_name_with_env(
    cmd: &mut Command,
    keys: impl IntoIterator<Item = impl Into<OsString>>,
) -> String {
    let env = cmd
        .get_envs()
        .filter_map(|(k, v)| v.map(|value| (k.to_os_string(), value.to_os_string())))
        .collect::<Vec<(OsString, OsString)>>();

    display_with_keys(cmd, env, keys)
}

You could call it like:

command_name_with_env(&command, ["RAILS_ENV"])

Motivation: Programmatically adding system information after a failed command execution for an end-user (e.g. which_problem)

I want to be able to debug PATH issues on a Command. The developer may accidentally call env_clear and fail to realize that it also removes PATH, or they might think they added something to the PATH but had a typo.

To help debug those cases, I wrote a library that performs a which lookup and adds debugging information which_problem crate. I want to use it to debug when a command has failed.

Something like this:

use std::ffi::OsString;
use std::process::Command;
use which_problem::Which;

fn main() {
    println!("Hello, world!");

    let mut command = Command::new("bundle");
    command.arg("install");
    let output = command.output();

    output
        .map_err(|error| {
            eprintln!("Executing command '{command:?}' failed.\nError: {error}");

            match which_problem_from_command(&command).diagnose() {
                Ok(details) => println!("Diagnostic info: {details}"),
                Err(error) => println!("Warning: Internal which_problem error: {error}"),
            }
            error
        })
        .unwrap();
}

// This is not valid as it cannot determine the actual env state.
//
// Either we can assume `env_clear` was called, and only explicit
// values should be used, or we have to assume it wasn't called
// and then implement the env var inheritance logic manually and
// hope there's not a mismatch between our logic and Rust's
//
fn which_problem_from_command(command: &Command) -> which_problem::Which {
    let cwd = command.get_current_dir().map(|path| path.to_path_buf());
    let program = command.get_program().to_owned();
    let path_env = command
        .get_envs()
        .find_map(|(key, maybe_value)| maybe_value.filter(|_| key == &OsString::from("PATH")))
        .map(|value| value.to_owned());

    Which {
        cwd,
        program,
        path_env,
        ..Default::default()
    }
}

However, the code must assume something about the state of Command as it cannot be directly observed. Giving the wrong path information to the person debugging will actively harm the debugging experience.

Solution sketches

A quick note on naming things: I've stubbed in names for methods, but I'm not tied to any of them.

Internally the combination of explicit and implicit env vars are retrieved via a capture or captured method. These are described as the env vars that will be captured (or were previously) by the command.

I've also considered the following:

canonical (like used for paths).
resolved
finalized

They're used interchangeably below.

[sketch] Expose captured env logic via a new `captured_envs` method on `Command`

Update: A sample implementation of this is here

We could add a method that exposes that information directly:

let command = Command::new("ls");

let envs: HashMap<OsString, OsString> = command.captured_envs().collect();

This sketch would solve problems 1 and 2. I'm open to naming suggestions.

[sketch] Expose captured env logic via a pure function that takes `Command` and `VarOs` or an iterator

For example:

/// General idea, not proposing this exact code, I didn't compile this
///
pub fn capture_envs(command: &Command, inherited: impl IntoIter<(OsString, OsString)>) -> CapturedEnvs {
        let mut result = BTreeMap::<EnvKey, OsString>::new();
        if !command.is_env_clear {
            for (k, v) in inherited {
                result.insert(k.into(), v);
            }
        }
        for (k, maybe_v) in &command.vars {
            if let &Some(ref v) = maybe_v {
                result.insert(k.clone(), v.clone());
            } else {
                result.remove(k);
            }
        }
        CapturedEnvs { result.into_iter() }
    }

The benefit of this approach is that it's now a pure function and does not rely on env::vars_os() call to the current environment. For this to work, we would also have to expose Command.is_env_clear internal state.

This sketch would solve problems 1 and 2. I'm open to naming suggestions.

[sketch] Expose the clear boolean state

If I had a flag on whether or not clear had been called on the Command struct, I would have enough information to resolve environment variables manually. Something like:

command.is_env_clear();
command.get_env_clear();

Internally the state is preserved on CommandEnvs, so another option could be to mirror the internal state and expose it on the iterator:

command.get_envs().is_clear()

This sketch would solve problem 1 but not 2. The end programmer must still reimplement the environment variable capture logic, which may diverge. I'm open to naming suggestions.

Even if we choose another solution, we may want to consider exposing this information anyway, as all other explicitly set states on Command can be queried except for this one.

[sketch] Expose captured env logic via method chain method on `Command`

As an alternative to the above Command::captured_envs method, we could make the information chainable:

let command = Command::new("ls");

let envs: HashMap<OsString, OsString> = command.get_envs().capture();

Like the above chained command.get_envs().is_clear(), this maps somewhat to the internal representation.

Benefits: The methods of Command are not changed. Downsides: More OOP than functional style. It removes the ability to use Command::captured_envs in a chain, like calling map on an option.

This sketch solves problems 1 and 2. I'm open to naming suggestions.

[sketch] Something else

Any ideas?

[sketch] Deprecate `get_envs` and introduce a new interface.

Deprecate get_envs and introduce other APIs. Motivation is provided below. An example could be:

command.get_removed_envs()
command.get_explicit_envs()
command.get_captured_envs()

While this idea is a stretch, I wanted to document that it was possible (in brainstorming, quantity leads to quality) and give some additional context on the existing interface.

When I first saw get_envs, I assumed it was referring to getting the state that would be set when it was run rather than the explicit state that must be resolved.

I was not paying close attention when I first encountered get_envs. I guessed what I thought it meant (and was REALLY wrong).

In the case of a None value, it doesn't mean:

"None was given, fallback to a parent's value."

It means:

"Someone explicitly told me to unset this singular env var and not fallback."

That behavior is the opposite of what I assumed.

While it's ultimately my mistake to own, I want to explore how we could improve that experience and provide a new interface for the captured/canonical/resolved envs.

I introduced a documentation PR and added some wording in the header, specifically "explicitly," which I hope will catch people's attention. https://github.com/rust-lang/rust/pull/109272, though there's still behavior inconsistency. Of the get methods, these have no ambiguity:

get_args
get_program

While these have some:

get_envs
get_current_dir

I've talked at length about get_envs. The other, get_current_dir, returns either Some if an explicit value was set or None if it wasn't. I'm not personally surprised by what happens when a None is seen, but I'm not all users. It is confusing that None for get_current_dir and None for get_envs means opposite things.

Links and related work

PR to update Command docs https://github.com/rust-lang/rust/pull/109272/files
Because the majority of my work is open source, you can somewhat follow along with my mental state and how I ultimately found this issue, and also why I think it's important to be able to pull out canonical data from &mut Command.
- Initial OO approach using a surrogate struct for Command (written when I knew much less about Rust, so the design and implementation aren't the greatest) https://github.com/heroku/buildpacks-ruby/blob/a63337c764a816122e7ccaea50005cfa85482e90/commons/src/env_command.rs
- Functional grab-bag approach https://github.com/heroku/buildpacks-ruby/pull/120/commits/cca6c29520cd8fe5b6ec3897c93bfb8f01be67d2. This is closer to a more idiomatic interface, but still not quite what I want.
- Work that lead me to discover this Command issue: A mix of structs to hold state transitions but relying on &mut Command to hold canonical data: https://github.com/heroku/buildpacks-ruby/pull/120/commits/cad093a10656f4a9d2b8adf1dd36b3c5e79fa345 (this commit message may also help you to understand my design goals and needs when working with Command)
- Reddit question on /r/learnrust

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

Mar 23 '23 22:03 schneems

To add to my deprecation section: I feel that the interface with get_current_dir conflicts with get_envs:

Command::get_current_dir - None means "was not set, fallback"
Command::get_envs - None means "was explicitly unset, do not fallback"

I think that get_envs is "less correct". In this case, as the key doesn't just have no value, it has a negative value with an associated impact on inherited env vars.

If I was re-doing this same interface from scratch and wanted to expose "what env var state was explicitly set on this object," I might use a custom enum with two states, Set<OsString> and Unset (or some other names). The bash command to remove an env var is unset. If we were to deprecate that one method, that might be a path forward.

I think naming is hard, though. I think it needs to start with get_ and have envs or env in it. Maybe something to indicate its specific to the internal state:

get_env_state
get_env_internal
get_explicit_envs
get_custom_envs

Split into two methods is an option:

get_set_envs && get_unset_envs
get_set_envs && get_removed_envs
get_export_envs && get_unset_envs

Mar 24 '23 16:03 schneems

For what it's worth, I prefer removed or deleted. As for splitting into two methods, what is the internal representation currently? Is it a single collection for both or multiple collections?

Of course it can be changed if necessary, but if it's already two collections that get chained together, then that's all the more reason to split up the methods.

If we do split them up, then I'd like to expose get_inherited_envs as a separate method as well.

Mar 25 '23 06:03 pitaj

what is the internal representation currently? Is it a single collection for both or multiple collections

This is the internal struct: https://github.com/rust-lang/rust/blob/e10eab5956f91c26dcc5ae29a19cfcd747047e4d/library/std/src/sys_common/process.rs#L14-L18. It's a single collection that stores env state pretty much the same way it's presented in the public API. It's a BTreeMap<EnvKey, Option<OsString>>.

Then right before the command executes, this is how the env vars are resolved: https://github.com/rust-lang/rust/blob/e10eab5956f91c26dcc5ae29a19cfcd747047e4d/library/std/src/sys_common/process.rs#L36-L51.

Mar 25 '23 14:03 schneems

I'm unsure of what happens next in this process so I made a reference implementation for "[sketch] Expose envs via a new method on Command" its at https://github.com/rust-lang/rust/pull/110327. I would like feedback there or here

Ultimately I think I still prefer this interface:

let envs: HashMap<OsString, OsString> = command.get_envs().with_inherited();

Though it will require a bit of refactoring. If it helps, I can code up that implementation too.

Apr 14 '23 17:04 schneems

I think returning an iterator makes a lot more sense. You can always collect it into a map. Plus that way you get to choose the kind of map you want, if you want the results sorted you can use BTreeMap for instance.

Apr 14 '23 18:04 pitaj

Note that talking about inherited values only makes limited sense because those are only snapshotted at process creation time and may be changed before or after via set_env.

If you want to inspect the environment of a process with you can, at least on unix, read /proc/<pid>/environ . I think windows has similar facilities to inspect process environments.

Also note that for debugging purposes the state already is available via the alternate formatting {:#?}, playground demo

Apr 15 '23 22:04 the8472

Note that talking about inherited values only makes limited sense because those are only snapshotted at process creation time and may be changed before or after via set_env.

Conceptually this limitation is no different than any other command that interacts with the system. For example: After a file is read from the disk, it could be mutated or modified.

It's worth noting this caveat in the documentation: the developer is responsible for timing and isolation.

If you want to inspect the environment of a process with you can, at least on unix, read /proc//environ . I think windows has similar facilities to inspect process environments.

We can access that information via https://doc.rust-lang.org/std/env/fn.vars_os.html, though without a stable/consistent way to determine if clear has been called, developers can't resolve the actual values used.

Also note that for debugging purposes the state already is available via the alternate formatting {:#?}, playground demo

I didn't note that here. But I did find that in my original investigation https://www.reddit.com/r/learnrust/comments/11sgsqt/detect_if_clear_env_has_been_called_on_a_command/.

It's good to note that capability here for future developers who stumble on this issue. For my main use case: I need to be able to programmatically determine what PATH the command was run with. I don't want to have to resort to parsing the debug output. I worry that the output format is fragile, and I don't think we want to write documentation that stabilizes the debug output format.

Apr 16 '23 20:04 schneems

I don't want to have to resort to parsing the debug output. I worry that the output format is fragile

I wasn't suggesting and wouldn't encourage that. I just noted it since one of the stated motivations was debugging which can be covered by that.

We can access that information via https://doc.rust-lang.org/std/env/fn.vars_os.html, though without a stable/consistent way to determine if clear has been called, developers can't resolve the actual values used.

I was talking about the environment of the child process.

Conceptually this limitation is no different than any other command that interacts with the system. For example: After a file is read from the disk, it could be mutated or modified.

Well, looking at the environment that a child process was spawned with is less susceptible to this than trying to reassemble that state from different parts within the parent.

What I'm saying is that if what you want to debug a child process then maybe it would be better to actually inspect the child process, not the Command that at some point in the past was used to spawn it.

Apr 16 '23 21:04 the8472

Thanks a ton for the feedback. I misunderstood your original comment. Special thanks to this comment:

trying to reassemble that state from different parts within the parent.

I realized I wasn't giving this context in the issue:

Rust implements Command execution by first capturing env vars logic is here and then explicitly passing them to a spawned process. It doesn't rely on implicit inheritance from the OS (as far as I can follow the code).

So, while I wrote the issue stating I've got one issue, I'm actually talking about two problems:

Cannot observe env_clear effects
No single point of truth available for this capturing env logic

Even if I could observe the env_clear effects, I would still be at risk of a divergent reimplementation of the internal Rust logic. I think it's cleaner and more stable to expose the outcome of that existing logic than to force developers to re-implement their own version.

Based on our conversation, I went back and updated my issue (added the bit about debug formatting, mentioned the TOCTOU issue is out of scope, added a second problem statement, added the Reddit link, updated some wording, and updated the title).

I still think reading from /proc/<pid>/environ is a neat idea that I hadn't considered before. There are some edge cases, though:

/proc/<pid>/environ won't exist for the case where the process fails to boot at all
Our check of /proc/<pid>/environ might happen after the process boots and exits
ABA problem if the process exits before we read it and another with the same PID is booted

Ultimately we don't need to depend on the OS since Rust is already owning the logic of assembling the different env parts and passing them to the child.

Apr 19 '23 19:04 schneems

[sketch] Expose the clear boolean state

Yeah, that seems like a simple thing we can do.

Though note that technically it's not quite sufficient since a Command can have a pre_exec hook which can do arbitrary things in the forked child, such as calling chroot which will confuse your path resolution. If you accept arbitrary Commands from library users and there is another library that hooks up chroot for a command then that could lead to conflicts. In other words std's Command is very much turing-complete.

Accepting a sort of CommandBuilder that only supports things that your library understands would be more robust.

/proc//environ won't exist for the case where the process fails to boot at all

The most reliable way to debug those is to trace the exec syscalls of the forked child. Everything else is only a proxy measure.

Our check of /proc//environ might happen after the process boots and exits ABA problem if the process exits before we read it and another with the same PID is booted

A zombie will stay around until you call wait on the child. I haven't checked if the environ is still available on a zombie, might be worth a try.

Apr 26 '24 23:04 the8472

reading from /proc/<pid>/environ

assuming that's like /proc/<pid>/cmdline (edit: which luckily turns out to be a false assumption), it reads directly from the running program's memory, so if that program modifies that memory, it modifies the value you read back from /proc/<pid>/environ.

Apr 27 '24 00:04 programmerjake

man pages disagree

/proc/pid/environ
              This file contains the initial environment that was set
              when the currently executing program was started via
              [execve(2)](https://man7.org/linux/man-pages/man2/execve.2.html).  The entries are separated by null bytes
              ('\0'), and there may be a null byte at the end.  Thus, to
              print out the environment of process 1, you would do:

                  $ cat /proc/1/environ | tr '\000' '\n'

              If, after an [execve(2)](https://man7.org/linux/man-pages/man2/execve.2.html), the process modifies its
              environment (e.g., by calling functions such as [putenv(3)](https://man7.org/linux/man-pages/man3/putenv.3.html)
              or modifying the [environ(7)](https://man7.org/linux/man-pages/man7/environ.7.html) variable directly), this file
              will not reflect those changes.

Apr 27 '24 01:04 the8472

Yeah, that seems like a simple thing we can do.

I think I got most of the way there with my linked PR, however tests were failing on windows CI and now there’s an issue rebasing that branch with HEAD. If someone wants to take a stab at it, I would be grateful 🙏.

Even if there are edge cases, I would still like to be able to read the config value that was written.

May 01 '24 00:05 schneems

libs-team libs-team copied to clipboard

Fix:Introduce an interface to expose the current `Command` captured env var logic

Proposal

Context

Problem statement(s)

Problem 1: Cannot observe effects of Command::env_clear

Problem 2: The "capture" logic has no exposed single source of truth

Out of scope

Motivation, use-cases

Motivation: Context

Motivation: Programmatically displaying a command's environment variables before it executes for an end-user.

Motivation: Programmatically adding system information after a failed command execution for an end-user (e.g. which_problem)

Solution sketches

[sketch] Expose captured env logic via a new captured_envs method on Command

[sketch] Expose captured env logic via a pure function that takes Command and VarOs or an iterator

[sketch] Expose the clear boolean state

[sketch] Expose captured env logic via method chain method on Command

[sketch] Something else

[sketch] Deprecate get_envs and introduce a new interface.

Links and related work

What happens now?

libs-team
libs-team copied to clipboard

Problem 1: Cannot observe effects of `Command::env_clear`

[sketch] Expose captured env logic via a new `captured_envs` method on `Command`

[sketch] Expose captured env logic via a pure function that takes `Command` and `VarOs` or an iterator

[sketch] Expose captured env logic via method chain method on `Command`

[sketch] Deprecate `get_envs` and introduce a new interface.