coreutils
coreutils copied to clipboard
split: make flaky test more verbose
test_round_robin_limited_file_descriptors is flaky and causes real problems.
The test imposes .limit(Resource::NOFILE, 9, 9), that's the point of the test. On my machine, this number can be lowered to 5; it always works with 5 or above, and never works below that. So I would assume that the "real" limit is 5 (plus minus a bit wiggle room for version differences).
On CI, it usually works with 9, but sometimes fails in the middle of the run (xaz, the 26th file), so it seems like there is a real issue, like an fd leak. (So we should not just raise the number.)
So let's at least make this test more verbose. This way, the next time it fails, we can see where exactly in <OutFiles as ManageOutFiles>::get_writer it fails. (At least that's where I think it fails.)
I have a bit of a bad feeling that it might be the line out_file.maybe_writer.as_mut().unwrap().flush()?;, i.e. flushing old files while there are no free descriptors left.
I would also love to run lsof at the time of crash, but since I cannot reproduce this issue locally, there's no way for me to do so. (And trying to do it automatically seems extremely difficult.)
Changes since last push: None, I just want a re-run.
Android build flaked, and this time I'm not gonna create a PR to fix it:
[2024-02-25 15:40:47] Compiling memchr v2.7.1
[2024-02-25 15:40:47] error: failed to run custom build command for `proc-macro2 v1.0.78`
[2024-02-25 15:40:47]
[2024-02-25 15:40:47] Caused by:
[2024-02-25 15:40:47] could not execute process `/data/data/com.termux/files/usr/tmp/cargo-install57bj3O/release/build/proc-macro2-ca558865293f126b/build-script-build` (never executed)
[2024-02-25 15:40:47]
[2024-02-25 15:40:47] Caused by:
[2024-02-25 15:40:47] Text file busy (os error 26)
[2024-02-25 15:40:47] warning: build failed, waiting for other jobs to finish...
[2024-02-25 15:40:49] error: failed to compile `cargo-nextest v0.9.67`, intermediate artifacts can be found at `/data/data/com.termux/files/usr/tmp/cargo-install57bj3O`.
GNU testsuite comparison:
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Changes since last push: None, I just want a re-run.
Our copy of the GNU tests flaked. I must have somehow angered the gods of CI flakiness.
Log of `test_uniq::gnu_tests Test 112.stdin`
Test 112.stdin
run: /target/i686-unknown-linux-musl/debug/coreutils uniq -D -c
thread 'test_uniq::gnu_tests' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: "failed to write to stdin of child: Broken pipe (os error 32)" }', tests/common/util.rs:2031:18
stack backtrace:
0: rust_begin_unwind
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
1: core::panicking::panic_fmt
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
2: core::result::unwrap_failed
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
3: core::result::Result<T,E>::unwrap
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1089:23
4: tests::common::util::UChild::wait_with_output
at ./tests/common/util.rs:2028:13
5: tests::common::util::UChild::wait
at ./tests/common/util.rs:1975:22
6: tests::common::util::UCommand::run
at ./tests/common/util.rs:1570:9
7: tests::common::util::UCommand::run_piped_stdin
at ./tests/common/util.rs:1578:9
8: tests::test_uniq::gnu_tests
at ./tests/by-util/test_uniq.rs:1058:22
uniq GNU tests flaked again in the same test. I'll ignore it this time.
Good news: The test failed exactly in this CI run.
Bad news: Derp, I'm an idiot, of course unable to open 'xbm'; aborting is not a panic, so setting RUST_BACKTRACE=1 does absolutely nothing. I'll create a new PR if/when I have a better idea how to test this.