gitoxide
gitoxide copied to clipboard
Nondeterministic macOS `is_symlink` assertion failure in `overwriting_files_and_lone_directories_works`
Current behavior 😯
In gix-worktree-state/tests/state/checkout.rs, the test overwriting_files_and_lone_directories_works fails occasionally on the assertion:
https://github.com/Byron/gitoxide/blob/3c2174101ed35dcb9bdb4585b3245507b15efe59/gix-worktree-state/tests/state/checkout.rs#L238
Usually it passes, but occasionally it fails. This happens both on CI and in manual runs. As far as I know, the failure only happens on macOS. Here's an example from CI in my fork:
FAIL [ 0.422s] gix-worktree-state-tests::worktree state::checkout::overwriting_files_and_lone_directories_works
--- STDOUT: gix-worktree-state-tests::worktree state::checkout::overwriting_files_and_lone_directories_works ---
running 1 test
test state::checkout::overwriting_files_and_lone_directories_works ... FAILED
failures:
failures:
state::checkout::overwriting_files_and_lone_directories_works
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 10 filtered out; finished in 0.42s
--- STDERR: gix-worktree-state-tests::worktree state::checkout::overwriting_files_and_lone_directories_works ---
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.11s
thread 'state::checkout::overwriting_files_and_lone_directories_works' panicked at gix-worktree-state/tests/state/checkout.rs:238:9:
assertion `left == right` failed
left: false
right: true
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I don't know why this happens, but maybe it is somehow related to concurrency? I say this based only on https://github.com/Byron/gitoxide/issues/1354#issuecomment-2107303983, indicating that "the filesystem probe" for symlinks is "still not safe for concurrent operations."
Expected behavior 🤔
The test should pass in all runs, rather than just most.
Git behavior
I'm unsure if there is a corresponding Git behavior, because I am unsure if this is a test bug or a bug in the code under test. Above the line with the assertion, the test has this comment:
https://github.com/Byron/gitoxide/blob/3c2174101ed35dcb9bdb4585b3245507b15efe59/gix-worktree-state/tests/state/checkout.rs#L237
I'm not sure what aspect of Git behavior that's referring to, though. As far as I know, Git determines whether it will attempt to create symlinks on Windows by checking core.symlinks and, unlike on other systems, defaults it to false if it is not set in any scope.
Steps to reproduce 🕹
I don't know a good procedure. I've been running tests a fair amount and occasionally seeing the problem. Running the tests many times in succession would likely trigger it. Also running them many times in succession while only allowing one test to run at a time might shed some light on whether concurrency is involved, but this would of course be even more time-consuming.
Thanks for reporting!
The filesystem probe would be thread-safe (within the worktree-state test-suite) now and I would expect it to be stable - that I have now put to the test as pre-condition.
Otherwise, I don't seem to be able to reproduce the issue locally, neither in single-threaded nor in multi-threaded mode, running it 100 times in short succession.
❯ hyperfine -N -m100 /Users/byron/dev/github.com/Byron/gitoxide/target/debug/deps/worktree-9f25a11ba27ab3a7
Benchmark 1: /Users/byron/dev/github.com/Byron/gitoxide/target/debug/deps/worktree-9f25a11ba27ab3a7
Time (mean ± σ): 212.4 ms ± 11.5 ms [User: 113.8 ms, System: 77.8 ms]
Range (min … max): 204.6 ms … 314.8 ms 100 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
In multi-threaded mode, strangely enough it does appear stable as well.
cargo test -p gix-worktree-state-tests --features gix-features-parallel
❯ hyperfine -N -m100 /Users/byron/dev/github.com/Byron/gitoxide/target/debug/deps/worktree-b64ec3555922d84e
Benchmark 1: /Users/byron/dev/github.com/Byron/gitoxide/target/debug/deps/worktree-b64ec3555922d84e
Time (mean ± σ): 211.2 ms ± 11.9 ms [User: 150.0 ms, System: 106.2 ms]
Range (min … max): 198.2 ms … 322.9 ms 100 runs
Warning: The first benchmarking run for this command was significantly slower than the rest (322.9 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
However, I think the key here is cargo nextest, which runs all tests in parallel, thus making it possible for the 'gitoxide repository probe' to be run in parallel.
This may happen in a few places.
In order to fix this particular issue, I think the probe code should throw a lock down into the repository to use the filesystem for synchronization or, probably better, do synchronization in-process. However, the in-process synchronization is probably not good enough for nextest as it builds all test binaries and then runs them.
An alternative to a lock-file, which I prefer, is to make sure the processes don't step on each others feet, so files they create are uniquely named.
This is certainly the solution we'd have to go for here.