rfcs Command improvements for ergonomics and error handling

Jan 03 '23 20:01 ijackson

I agree with the motivation that Command is not very ergonomic, especially around the common "return error containing stderr on error" workflow. But this RFC is a little too opinionated about stderr in my opinion. It strictly treats it as the "error output", which is just often not true in practice. Even cargo logs all its output to stderr. Therefore, considering any output to stderr an error is a bad idea. Also, if I read this correctly, there's no way to get stdout when the command failed using these methods except with the existing .output(), but this plans to deprecate that.

Jan 04 '23 19:01 Noratrieb

I agree with the motivation that Command is not very ergonomic, especially around the common "return error containing stderr on error" workflow. But this RFC is a little too opinionated about stderr in my opinion. It strictly treats it as the "error output",

Um? That's not true. In my RFC, by default, error output is sent to the Rust program's stderr (inherit).

which is just often not true in practice. Even cargo logs all its output to stderr. Therefore, considering any output to stderr an error is a bad idea.

I think this depends what program you're running. I don't think it is wrong to have the option to treat stderr output as an error, as a non-default mode.

Also, if I read this correctly, there's no way to get stdout when the command failed using these methods except with the existing .output(), but this plans to deprecate that.

No, that's not true either. If you have a SubprocessError, you can call the .stdout_bytes() method on it. (It will only be Some if you called one of the .get_output* methods, obviously; otherwise the stdout went to wherever you configured the Command to send it.)

Jan 04 '23 20:01 ijackson

Ah, I seem to have overlooked that, sorry.

Jan 04 '23 20:01 Noratrieb

In a broad sense, I think the approach of this RFC seems mostly reasonable, and I'd like to see something along these lines.

There are two categories of issues I'd like to raise. I think both are addressable, and I'm hoping it'll be possible to address them and merge the resulting RFC.

One family of issues are largely bikeshed-painting, where the functionality seems perfectly fine but I think the names are unintuitive. For those, I want to be clear that I'm not attached to any particular names, and primarily want to express some desired properties of a name that I think the proposed names don't have. I'd like to align on those properties and then I don't especially care what name we pick that meets those properties.

The other family of issues are those of functionality, where I think the proposal makes a common use case more challenging.

First, the bikesheds:

Bikeshed: We currently have programs using a method called output that retrieves both stdout and stderr, merged. I don't think we can switch from that to methods named get_output* that only return stdout without leading to some confusion. I would propose a name that makes it clear that these only return stdout, such as get_stdout. (Arguably that may mean it needs to be called read_stdout or similar, because get_stdout might imply "get the previously set standard output device"; in that case, we could also have stdout_reader for the reader method to avoid read_stdout_reader.)
Bikeshed (minor): Perhaps ProcessError rather than SubprocessError? We currently use the term "process" in various places, and not "subprocess". Plus, this error type seems useful for any kind of process.
Bikeshed: s/ChildOutputRead/ChildOutputReader/, and likewise s/_read/_reader/ for the method returning it. "reader" seems to be the ecosystem convention for "things that implement Read", and Stream is something else entirely. (Regarding the item in the "Drawbacks" section, though, I'm completely in favor of having this function and type.)

Next, the semantics:

Semantic: Please spell out exactly how you're expecting people to retrieve stdout-and-stderr interleaved, as they can currently do with output. If this isn't as simple as "call a single non-deprecated method", I think that needs improving. I think the proposal is that if you route stderr to the same place you route stdout, rather than capturing it in a separate pipe, then you'll get the combined output and stderr won't get treated as a failure? In any case, that needs documenting and needs to be simple. (Common use case: spawning a compiler-like tool.)
Semantic: We should not encourage developers to fail on non-UTF-8 in the common case. I think get_stdout should return bytes, and we should offer a get_stdout_string that returns a UTF-8 String. Similarly, the error type should have methods stdout and stderr rather than stdout_bytes and stderr_bytes.
Semantic (exotic): I have some use cases (involving UNIX sockets in datagram mode) for which I cannot keep reading stdout until EOF, and instead I have to stop reading stdout as soon as 1) the process exits and 2) a non-blocking read returns nothing. Would it be possible (either in-std or outside std) to implement a reader with that semantic atop these APIs?
Semantic: Please document that we need impl Error for ProcessError. Also, that implementation is going to have a little trouble implementing the cause method, unless there's a semantic guarantee that only one of the types of error can happen. This seems like it'd be guaranteed for errors produced by the standard library, but the APIs for building a ProcessError don't statically guarantee that, nor do the accessor methods for ProcessError. If we could maintain that guarantee, such that there's only one chained error, that seems ideal.
Semantic: I think we should show the command-line arguments in the default case, even though they might be verbose. However, sometimes the command-line arguments might be sensitive. For convenience for such applications, perhaps a hide_args() method that avoids including the args in the error. (We could get more exotic with a .masked_arg() method that supplies and masks one argument to the command, but that's much more complex and I don't think that should be in this RFC. hide_args() seems simple enough though.)
Semantic: I can imagine the convenience value of the _line variant, but could you give some examples of code that would benefit from it? I'm also wondering if it'd be reasonably convenient to call read_stdout_string and then have a convenience method on String for "fail if not exactly one line", which seems useful in other contexts too.
Semantic (minor): Please add a note to the "Further necessary APIs for SubprocessError" section that these APIs could be under a separate feature gate and stabilized separately; that'll make it easier to potentially do a partial stabilization.

Jan 12 '23 04:01 joshtriplett

I think get_stdout should return bytes,

Shouldn't it be OsString?

However, sometimes the command-line arguments might be sensitive. For convenience for such applications, perhaps a hide_args() method that avoids including the args in the error.

Sensitive things should not be passed as command line arguments, since those can leave traces in arbitrary places around the OS. For example, they may be included in ps output. We should not encourage half-baked security practices.

Jan 12 '23 12:01 afetisov

Shouldn't it be OsString?

No. On Windows that would be WTF-8, not raw bytes.

Jan 12 '23 23:01 joshtriplett

I agree with the motivation that Command is not very ergonomic, especially around the common "return error containing stderr on error" workflow. But this RFC is a little too opinionated about stderr in my opinion. It strictly treats it as the "error output",

Um? That's not true. In my RFC, by default, error output is sent to the Rust program's stderr (inherit).

But what if you also want to capture stderr?

As @joshtriplett mentioned output currently gives you interleaved stdout/stderr, and if that is deprecated, there isn't a way to get that. Perhaps one way to do that would be to add a new constructor to Stdio that would allow you to redirect stderr to stdout or vice versa?

And should there be functions for getting stdout and stderr as separate Strings or Vec<u8>s? Of course it is possible to implement that using spawn and Child, but that is true of all these other more ergonomic functions as well.

Jan 13 '23 08:01 tmccombs

Thanks for all the comments, and the encouragement. I think in most cases I will be modifying the RFC in line with the suggestions. And I'll add some examples (and add some more stuff to the alternatives section).

I think the mutable/constructable SubprocessError portion of the API needs more work -- Ideally it could be a lot smaller and have fewer moving pieces. Perhaps it's trying to be too much at once?

I agree that it is uncomfortably large, but, sadly, I think it is inevitable. I think what I will do is make it a transparent struct (for it is a struct, not an enum) and add a lot more commentary explaining why it is (has to be) that way.

Jan 13 '23 13:01 ijackson

@joshtriplett Thanks for the review.

First, the bikesheds:

Your suggestions mostly made sense to me, so I have simply accepted them.

I'm still unsure about read_stdout, because it does more than just read the stdout: it also spawns and waits for termination. I have added some text about this to the relevant part of Unresolved Questions.

Please spell out exactly how you're expecting people to retrieve stdout-and-stderr interleaved, as they can currently do with output

I agree that this is an important use case.

But think this is not in fact currently possible. Output contains two Vec<u8>, so stdout and stderr are not interleaved but separated. And it is not possible to reconstruct the order of output.

I think this requirement should be met by a new configuration method on Command. This seems conceptually independent of my RFC, and could be done either before or after. So I have added that to Future Possibilities.

Note also https://github.com/rust-lang/rust/pull/88561 where I propose to provide the ability to redirect the subprocess's stdout to one's own stderr, or vice versa.

We should not encourage developers to fail on non-UTF-8 in the common case. I think get_stdout should return bytes, and we should offer a get_stdout_string

Isn't this just a question of naming ? I think this should have been in your "bikeshed" list :-).

I looked around in fs and concluded that you're right and we usually use the undecorated name for the bytes version. So I've renamed read_stdout to read_stdout_string and read_stdout_bytes to read_stdout.

Semantic (exotic): I have some use cases (involving UNIX sockets in datagram mode) for which I cannot keep reading stdout until EOF, and instead I have to stop reading stdout as soon as 1) the process exits and 2) a non-blocking read returns nothing. Would it be possible (either in-std or outside std) to implement a reader with that semantic atop these APIs?

I think you can do this already with piped() and Child::try_wait and so on. This is not the kind of thing that my new API is likely to help you very much with, although possibly if you find that things go wrong you may find it convenient to make a ProcessError out of your program's experiences.

Semantic: Please document that we need impl Error for ProcessError.

Done, and discussed cause.

Semantic: I think we should show the command-line arguments in the default case

I added some notes about the question of sensitive arguments. I do still think we should show them.

Semantic: I can imagine the convenience value of the _line variant, but could you give some examples of code that would benefit from it? [suggestion of convenience method on String]

Many programs on Unix are expected to print a single line. whoami, git rev-parse, wc.

Its main use is that it trims the newline for you, but you could do that (with slightly less error checking) with .trim_end().

TBH I am not 100% sure this function carries its weight. But OTOH the method function String approach isn't able to include the command information in its error. On the third hard, that is true of basically any parsing one does of the output.

In the interests of making progress, I think it's best to decouple this question from my RFC. So I have moved it to the Future Possibilities section.

"Further necessary APIs for SubprocessError" section

Some of these APIs are separable, but at least .has_problem() isn't. I have added notes to these.

The getters and setters are gone now, which makes this much smaller.

(comment edited to fix many spurious linebreaks, sorry about that)

Jan 13 '23 17:01 ijackson

Nit (emphatically not a blocker): your commit message "Rename to ChildOutputStream" seems inaccurate (specifically the "to"), as it renames ChildOutputStream to ChildOutputReader.

Jan 19 '23 09:01 joshtriplett

I've nominated this for the next libs-api meeting. I anticipate proposing FCP on it, unless folks raise any other issues before then.

Jan 19 '23 10:01 joshtriplett

I had been thinking about this, and in particular the implementation, and felt that communication_error: Option<io:Error> was suboptimal, because it makes handling of subprocess communication errors less correct (the kind() isn't very useful). See the reasoning I have left in the comment there.

I'm not 100% sure whether this is better than my previous proposal. One option woudl be to leave this one way or the other for now, and let the implementor decide.

Jan 19 '23 13:01 ijackson