`bacon` sometimes misses non-zero exit codes
Usually when a command run through bacon exits with a non-zero exit code, bacon prints Command error code: <code> in the top status bar. This does not always work.
Small reproducer (in an empty cargo project):
src/main.rs:
use std::process::ExitCode;
fn main() -> ExitCode {
println!("program ran");
ExitCode::FAILURE
}
No bacon.toml/global bacon config file.
Run bacon run and keep pressing r. Eventually there will be a run where the Command error code message is not shown.
Alternatively, this small script runs bacon in a loop until the message is missing:
while true; do
(sleep 1s; killall bacon) & /usr/bin/bacon run > out 2>&1
if grep -q "program ran" out && ! grep -q "Command error code: 1" out; then
break
fi
done
For me, it took about a minute to reproduce with this script.
For this simple program, it doesn’t happen very often (maybe 1% of the time), but in a real project (which prints about 120 lines and overall takes about 200ms to run), this happens at a noticeable rate, maybe 50%.
Slightly larger reproducer that makes manual reproduction easier:
use std::process::ExitCode;
use std::time::Duration;
fn main() -> ExitCode {
for i in 0..1000 {
println!("{}", i.to_string().repeat(i));
}
std::thread::sleep(Duration::from_millis(150));
println!("program ran");
ExitCode::FAILURE
}
More output and longer execution times seem to make it happen more often.
As a small check that it’s not cargo that’s swallowing the exit code, this does not terminate for me:
while ! cargo run; do true; done
I think I found the problem: When the child process closes its stderr, bacon assumes that the process has ended https://github.com/Canop/bacon/blob/736cd69ca440bdd11ecec2f7cfb37f3400cee66d/src/exec/executor.rs#L178-L183 and tries to read the child’s exit code using try_wait: https://github.com/Canop/bacon/blob/736cd69ca440bdd11ecec2f7cfb37f3400cee66d/src/exec/executor.rs#L202
There are two problems here: First, when stderr is closed because the child has exited, this is racy (the problem seen in the initial reproducers): It seems there is a small time window between closing stderr and the actual process exit/time when the exit code is available for try_wait. This is easy to fix: bacon already blocks on wait here: https://github.com/Canop/bacon/blob/736cd69ca440bdd11ecec2f7cfb37f3400cee66d/src/exec/executor.rs#L217 Sending the exit code returned from this wait call fixes this first problem.
The second problem is that a process can close its stderr without exiting. This causes bacon to become unresponsive and ignore all further output on stdout. Small reproducer:
use std::process::ExitCode;
use std::time::Duration;
fn main() -> ExitCode {
unsafe {
libc::close(libc::STDERR_FILENO);
}
std::thread::sleep(Duration::from_secs(10));
println!("program ran");
ExitCode::FAILURE
}
I’m not sure what the best fix for that would be (and if it’s even worth fixing).
For the second part of this issue (that is not fixed by #405), I don’t think there’s a particularly easy or good solution: bacon would have to wait for either a kill signal on the StopMessage channel or for process exit in parallel; and abort the other wait when the first is done. With some async it would be possible, but without async there’s basically only polling on try_recv and try_wait?