coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

split: make flaky test more verbose

Open BenWiederhake opened this issue 1 year ago • 4 comments

test_round_robin_limited_file_descriptors is flaky and causes real problems.

The test imposes .limit(Resource::NOFILE, 9, 9), that's the point of the test. On my machine, this number can be lowered to 5; it always works with 5 or above, and never works below that. So I would assume that the "real" limit is 5 (plus minus a bit wiggle room for version differences). On CI, it usually works with 9, but sometimes fails in the middle of the run (xaz, the 26th file), so it seems like there is a real issue, like an fd leak. (So we should not just raise the number.)

So let's at least make this test more verbose. This way, the next time it fails, we can see where exactly in <OutFiles as ManageOutFiles>::get_writer it fails. (At least that's where I think it fails.)

I have a bit of a bad feeling that it might be the line out_file.maybe_writer.as_mut().unwrap().flush()?;, i.e. flushing old files while there are no free descriptors left.

I would also love to run lsof at the time of crash, but since I cannot reproduce this issue locally, there's no way for me to do so. (And trying to do it automatically seems extremely difficult.)

BenWiederhake avatar Feb 25 '24 15:02 BenWiederhake

Changes since last push: None, I just want a re-run.

Android build flaked, and this time I'm not gonna create a PR to fix it:

[2024-02-25 15:40:47]    Compiling memchr v2.7.1
[2024-02-25 15:40:47] error: failed to run custom build command for `proc-macro2 v1.0.78`
[2024-02-25 15:40:47] 
[2024-02-25 15:40:47] Caused by:
[2024-02-25 15:40:47]   could not execute process `/data/data/com.termux/files/usr/tmp/cargo-install57bj3O/release/build/proc-macro2-ca558865293f126b/build-script-build` (never executed)
[2024-02-25 15:40:47] 
[2024-02-25 15:40:47] Caused by:
[2024-02-25 15:40:47]   Text file busy (os error 26)
[2024-02-25 15:40:47] warning: build failed, waiting for other jobs to finish...
[2024-02-25 15:40:49] error: failed to compile `cargo-nextest v0.9.67`, intermediate artifacts can be found at `/data/data/com.termux/files/usr/tmp/cargo-install57bj3O`.

BenWiederhake avatar Feb 25 '24 16:02 BenWiederhake

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

github-actions[bot] avatar Feb 25 '24 16:02 github-actions[bot]

Changes since last push: None, I just want a re-run.

Our copy of the GNU tests flaked. I must have somehow angered the gods of CI flakiness.

Log of `test_uniq::gnu_tests Test 112.stdin`
Test 112.stdin
run: /target/i686-unknown-linux-musl/debug/coreutils uniq -D -c
thread 'test_uniq::gnu_tests' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: "failed to write to stdin of child: Broken pipe (os error 32)" }', tests/common/util.rs:2031:18
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1089:23
   4: tests::common::util::UChild::wait_with_output
             at ./tests/common/util.rs:2028:13
   5: tests::common::util::UChild::wait
             at ./tests/common/util.rs:1975:22
   6: tests::common::util::UCommand::run
             at ./tests/common/util.rs:1570:9
   7: tests::common::util::UCommand::run_piped_stdin
             at ./tests/common/util.rs:1578:9
   8: tests::test_uniq::gnu_tests
             at ./tests/by-util/test_uniq.rs:1058:22

BenWiederhake avatar Feb 25 '24 16:02 BenWiederhake

uniq GNU tests flaked again in the same test. I'll ignore it this time.

BenWiederhake avatar Feb 25 '24 17:02 BenWiederhake

Good news: The test failed exactly in this CI run.

Bad news: Derp, I'm an idiot, of course unable to open 'xbm'; aborting is not a panic, so setting RUST_BACKTRACE=1 does absolutely nothing. I'll create a new PR if/when I have a better idea how to test this.

BenWiederhake avatar Feb 28 '24 11:02 BenWiederhake