bkt icon indicating copy to clipboard operation
bkt copied to clipboard

cli::discard_failures_in_background is flaky

Open dimo414 opened this issue 3 years ago • 5 comments

Reported by @tranzystorek-io in https://github.com/dimo414/bkt/issues/12#issuecomment-1080012510 - example failure blaming this line:

---- cli::discard_failures_in_background stdout ----
thread 'cli::discard_failures_in_background' panicked at 'SystemTime { tv_sec: 1648411197, tv_nsec: 770703005 } !> SystemTime { tv_sec: 1648411197, tv_nsec: 770703005 }', tests/cli.rs:227:9

dimo414 avatar Mar 28 '22 05:03 dimo414

This test is known to be brittle (see 58e62b) though I don't totally understand why. Presumably a race with the backgrounded process but IIRC I observed the process completing even though the modtime hadn't changed. Will try to dig into this further, or we can always disable it in the meantime if it's causing trouble.

dimo414 avatar Mar 28 '22 05:03 dimo414

How critical is this test? If not too much, I'd opt for disabling

tranzystorekk avatar Apr 11 '22 12:04 tranzystorekk

Well it does provide coverage for supported behavior so I'd rather fix it, but if it's causing too much trouble I'll disable it for now.

dimo414 avatar Apr 11 '22 22:04 dimo414

I may be misreading this, but it looks like the test is sleeping for a total of 1sec (10x 100ms) rather than the invalidate threshold of 10sec. Is it that simple? Or am I missing something?

zbentley avatar May 01 '22 10:05 zbentley

The sleep duration should only need to be long enough for the command to execute, it isn't related to the --stale threshold because we've already marked the directory as 15s stale. As mentioned above "I observed the process completing even though the modtime hadn't changed" previously. Here's what I'm seeing:

  • Increasing the loop iterations (e.g. 1..100) does not avoid the race condition
  • Commenting out the modtime assert allows the test to pass seemingly consistently
  • The following assert of the file's contents should fail if the subprocess hasn't executed yet, because it's reading the contents of the same file we're checking the mod-time of

In other words, it seems like occasionally file's modtime isn't being updated even though it's been written to since line 217. That doesn't make sense to me but I'm not sure how else to explain it :)

dimo414 avatar May 07 '22 18:05 dimo414

it seems like occasionally file's modtime isn't being updated

Yeah I'm able to reproduce this pretty consistently in the Rust playground. I guess modtime updates simply aren't atomic.

Edit: well, duh. I knew that already (and the existing code handles it). Brain fart 🫤

dimo414 avatar Mar 26 '23 03:03 dimo414