Some tests terminate in an unusual manner when canceled
Discussed in https://github.com/GitoxideLabs/gitoxide/discussions/1414
Note: I've converted the discussion to this issue, as suggested by @Byron, after further investigation suggests it may not be related to
cargo nextestor the use ofserial_test::serialand thus perhaps instead to the behavior of gitoxide or non-test-specific dependencies. —Eliah
Originally posted by EliahKagan June 22, 2024
As Raymond Chen has explained in STATUS_STACK_BUFFER_OVERRUN doesn’t mean that there was a stack buffer overrun, this condition is misleading and often not nearly as serious as it may seem, because this way of terminating a process on Windows often does not designate any stack overflow, buffer overrun, or any kind of memory-related bug or error. Instead, it is widely used to terminate a program in any kind of critical immediate failure.
Nonetheless, I find it odd that I see this when I cancel gitoxide's test suite with Ctrl+C on Windows, since it does not happen when I cancel other Rust projects' tests when running them in the same way.
ek@Glub MINGW64 ~/source/repos/gitoxide (main)
$ cargo nextest run --all --no-fail-fast
warning: function `evaluate_target_dir` is never used
--> gix-prompt\tests\prompt.rs:9:8
|
9 | fn evaluate_target_dir() -> String {
| ^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: `gix-prompt` (test "prompt") generated 1 warning
Compiling gix-filter v0.11.2 (C:\Users\ek\source\repos\gitoxide\gix-filter)
Finished `test` profile [unoptimized + debuginfo] target(s) in 4.67s
Starting 2355 tests across 127 binaries (9 skipped; run ID: 8406b876-001a-41fa-a054-487ee2cf1f3f, nextest profile: default)
PASS [ 0.095s] gitoxide plumbing::main::tests::clap
PASS [ 0.090s] gitoxide shared::value_parser_tests::rename_fraction
PASS [ 0.085s] gix id::tests::size_of_oid
PASS [ 0.080s] gix open::tests::size_of_options
PASS [ 0.037s] gix remote::connection::fetch::refs::tests::update::unborn_remote_branches_can_be_created_locally_if_they_are_new
PASS [ 0.043s] gix remote::connection::fetch::refs::tests::update::remote_symbolic_refs_with_locally_unavailable_target_result_in_valid_peeled_branches
PASS [ 0.050s] gix remote::connection::fetch::refs::tests::update::remote_symbolic_refs_with_locally_unavailable_target_dont_overwrite_valid_local_branches
PASS [ 0.053s] gix remote::connection::fetch::refs::tests::update::unborn_remote_branches_can_update_local_unborn_branches
PASS [ 0.047s] gix remote::connection::fetch::refs::tests::update::unborn_remote_refs_dont_overwrite_valid_local_refs
PASS [ 0.281s] gix remote::connection::fetch::refs::tests::update::checked_out_branches_in_worktrees_are_rejected_with_additional_information
PASS [ 0.409s] gix::gix clone::blocking_io::fetch_and_checkout_specific_non_existing
PASS [ 0.469s] gix::gix clone::blocking_io::fetch_and_checkout_empty_remote_repo
PASS [ 0.809s] gix::gix clone::blocking_io::fetch_only_without_configuration
PASS [ 1.000s] gix::gix clone::blocking_io::fetch_and_checkout_specific_annotated_tag
PASS [ 1.047s] gix::gix clone::blocking_io::fetch_only_with_configuration
PASS [ 1.107s] gix::gix clone::blocking_io::fetch_and_checkout_specific_ref
PASS [ 1.168s] gix::gix clone::blocking_io::fetch_and_checkout
PASS [ 0.053s] gix::gix clone::clone_and_destination_must_be_empty
PASS [ 0.130s] gix::gix clone::clone_and_early_persist_without_receive
PASS [ 0.128s] gix::gix clone::clone_bare_into_empty_directory_and_early_drop
PASS [ 0.248s] gix::gix clone::blocking_io::from_shallow_prohibited_with_option
PASS [ 0.082s] gix::gix clone::clone_into_empty_directory_and_early_drop
PASS [ 0.069s] gix::gix commit::describe::lightweight_tags_are_sorted_lexicographically
PASS [ 0.068s] gix::gix commit::describe::tags_are_sorted_by_date_and_lexicographically
PASS [ 1.002s] gix::gix clone::blocking_io::fetch_shallow_no_checkout_then_unshallow
PASS [ 0.067s] gix::gix commit::describe::tags_are_sorted_by_priority
PASS [ 0.054s] gix::gix commit::describe::with_dirty_suffix::dirty_suffix_does_not_apply_if_not_dirty
PASS [ 0.068s] gix::gix commit::describe::with_dirty_suffix::dirty_suffix_applies_automatically_if_dirty
PASS [ 0.040s] gix::gix config::tree::branch::merge
PASS [ 0.998s] gix::gix clone::blocking_io::from_non_shallow_by_deepen_exclude_then_deepen_to_unshallow
PASS [ 0.041s] gix::gix config::tree::checkout::workers
PASS [ 0.048s] gix::gix config::tree::core::abbrev
PASS [ 0.039s] gix::gix config::tree::core::autocrlf
PASS [ 0.029s] gix::gix config::tree::core::check_round_trip_encoding
PASS [ 0.042s] gix::gix config::tree::core::check_stat
PASS [ 0.034s] gix::gix config::tree::core::delta_base_cache_limit
Canceling due to interrupt: 15 tests still running
PASS [ 0.041s] gix::gix config::tree::core::disambiguate
PASS [ 0.033s] gix::gix config::tree::core::log_all_ref_updates
PASS [ 0.537s] gix::gix clone::blocking_io::from_shallow_allowed_by_default
PASS [ 0.040s] gix::gix config::tree::core::eol
PASS [ 0.030s] gix::gix config::tree::core::safecrlf
PASS [ 0.029s] gix::gix config::tree::diff::driver_binary
PASS [ 0.036s] gix::gix config::tree::diff::algorithm
PASS [ 0.051s] gix::gix config::tree::core::timeouts
PASS [ 0.030s] gix::gix config::tree::diff::renames
PASS [ 0.045s] gix::gix config::tree::extensions::object_format
PASS [ 0.037s] gix::gix config::tree::fetch::algorithm
PASS [ 0.031s] gix::gix config::tree::gitoxide::allow::protocol_from_user
PASS [ 0.039s] gix::gix config::tree::fetch::recurse_submodule
PASS [ 0.049s] gix::gix config::tree::gitoxide::author::name_and_email_fallback
PASS [ 0.042s] gix::gix config::tree::gitoxide::commit::author_and_committer_date
PASS [ 0.045s] gix::gix config::tree::gitoxide::committer::name_and_email_fallback
PASS [ 0.036s] gix::gix config::tree::gitoxide::http::connect_timeout
PASS [ 0.028s] gix::gix config::tree::http::extra_header
PASS [ 0.027s] gix::gix config::tree::http::follow_redirects
PASS [ 0.029s] gix::gix config::tree::http::http_version
ABORT [ 1.810s] gix remote::connection::fetch::refs::tests::update::remote_symbolic_refs_can_always_be_set_as_there_is_no_scenario_where_it_could_be_nonexisting_and_rejected
Message [ ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
ABORT [ 1.765s] gix remote::connection::fetch::refs::tests::update::various_valid_updates
Message [ ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
ABORT [ 0.801s] gix::gix clone::blocking_io::from_non_shallow_then_deepen_then_deepen_since_to_unshallow
Message [ ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
ABORT [ 1.901s] gix remote::connection::fetch::refs::tests::update::remote_symbolic_refs_can_be_written_locally_and_point_to_tracking_branch
Message [ ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
ABORT [ 2.001s] gix remote::connection::fetch::refs::tests::update::non_fast_forward_is_rejected_if_dry_run_is_disabled
Message [ ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
ABORT [ 2.006s] gix remote::connection::fetch::refs::tests::update::remote_refs_cannot_map_to_local_head
Message [ ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
ABORT [ 2.067s] gix remote::connection::fetch::refs::tests::update::fast_forwards_are_called_out_even_if_force_is_given
Message [ ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
ABORT [ 2.062s] gix remote::connection::fetch::refs::tests::update::local_symbolic_refs_can_be_overwritten
Message [ ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
ABORT [ 2.118s] gix remote::connection::fetch::refs::tests::update::local_direct_refs_are_written_with_symbolic_ones
Message [ ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
ABORT [ 2.215s] gix remote::connection::fetch::refs::tests::update::non_fast_forward_is_rejected_but_appears_to_be_fast_forward_in_dryrun_mode
Message [ ] code 0xc0000409: The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application. (os error 1282)
------------
Summary [ 2.264s] 66/2355 tests run: 56 passed, 10 failed, 0 skipped
error: test run failed
As noted in this Stack Overflow answer and that Microsoft documentation, 0xc0000409 designates a (user mode) STATUS_STACK_BUFFER_OVERRUN.
When this is produced by a call to the __fastfail intrinsic, there is a separate code that provides additional information. I don't know if that's relevant here or how to extract the information. The other error information shown in that test output is os error 1282 which seems to just mean "invalid operation."
For contrast, I have tried running cargo nextest --all --no-fail-fast and pressing Ctrl+C on that same Windows system for three other projects, all of which showed a more intuitive termination condition and none of which showed 0xc0000409.
ripgrep:
ABORT [ 0.047s] globset glob::tests::matchcasei1
Message [ ] code 0xc000013a: {Application Exit by CTRL+C}
The application terminated as a result of a CTRL+C. (os error 572)
rust-bio:
ABORT [ 0.166s] bio data_structures::interval_tree::array_backed_interval_tree::tests::find_arbitrary
Message [ ] code 0xc000013a: {Application Exit by CTRL+C}
The application terminated as a result of a CTRL+C. (os error 572)
(With various other interrupted tests, with the same message.)
crossbeam:
ABORT [ 1.046s] crossbeam-channel::after fairness
Message [ ] code 0xc000013a: {Application Exit by CTRL+C}
The application terminated as a result of a CTRL+C. (os error 572)
(With many other interrupted tests, with the same message.)
"Application Exit by CTRL+C" is an intuitive effect of pressing Ctrl+C to cancel tests, while a request for immediate termination--even if not related to any actual memory errors--is less intuitive.
Full output from all four projects tested can be seen in this gist.
I do not assume this is a bug. But I don't want to assume it isn't, unless this can be explained.
The nextest documentation notes that it uses Windows job objects to terminate process trees "immediately," which I would be taken to think might explain this behavior if it always or usually happened, but does not explain it given that it seems only to happen with gitoxide.
The effect does not appear specific to when tests from any particular crate of gitoxide are running.
Although I believe no signal--in the usual POSIX-related sense--is being used here to terminate the processes. Nonetheless, maybe this has something to do with how gitoxide deals with signals?
I cannot produce this with cargo test commands but I don't know if that means it's related to nextest or instead if just cargo test does not show enough information about termination to reveal it. Likewise, I do not know if this can happen under any non-test circumstances when running applications that use gitoxide crates.
Hey, nextest author here.
The nextest documentation notes that it uses Windows job objects to terminate process trees "immediately," which I would be taken to think might explain this behavior if it always or usually happened, but does not explain it given that it seems only to happen with gitoxide.
In case of a Ctrl-C, Windows sends the signal to all processes running on the console that don't have a disabled handler. See the note next to CTRL_C_EVENT in GenerateConsoleCtrlEvent.
On Unix, nextest puts each process in a process group, and is responsible for forwarding signals to test processes. On Windows, nextest does not put processes in Windows's own notion of process group -- not out of principle, but just because I'd like someone with Windows expertise to enumerate and outline all the issues to look for. Test processes can do many weird things, so you have to be quite careful while making changes.
See nextest's signal handling docs for more info. These details have been the same for a long time, and haven't changed for 0.9.86.
When nextest receives a Ctrl-C, all of the other tests are likely to have received Ctrl-C at the same time as well, and are likely to have immediately exited. Nextest does try to call TerminateJobObject on all tests, but this is generally a no-op. It is possible, though, that with gitoxide the tests don't exit immediately on Ctrl-C -- but rather have a Ctrl-C handler, and are terminated by TerminateJobObject instead. That might explain the strange errors.
And it suggests a possible improvement in nextest -- maybe on a ctrl-c, nextest should wait for processes to exit by themselves (for the grace-period setting, default 10s) before calling TerminateJobObject on them.
Thanks for collecting everything we figured out thus far, @EliahKagan , and great to see @sunshowers chiming in, too!
Although I believe no signal--in the usual POSIX-related sense--is being used here to terminate the processes. Nonetheless, maybe this has something to do with how gitoxide deals with signals?
If that would happen, then it's a bug. For all I know, signal handlers are never installed automatically, nor are they dealt with in any way in plumbing crates. gix (CLI) will call install the provided gix-tempfile signal handlers to assure tempfiles are cleaned up similarly to how Git does it, but that's not under test here.
What's interesting is the gix::gix clone::blocking_io::fetch* family of tests which are a little bit special. The cloning happens from a local path, which means a git process will be spawned which acts as a server speaking the git protocol. When a signal arrives, I assume it would reach the spawned process, and maybe it will respond in a special way or has handlers installed. And it's very likely that Git will have handlers installed as that is necessary for the tempfile/lockfile machinery.
While all that is happening, gix will definitely be reading from a pipe, maybe that's also special while a signal arrives.
The cloning happens from a local path, which means a git process will be spawned which acts as a server speaking the git protocol. When a signal arrives, I assume it would reach the spawned process, and maybe it will respond in a special way or has handlers installed.
This sounds right absent something special. Ctrl-C would go to all processes in the console.
And it's very likely that Git will have handlers installed as that is necessary for the tempfile/lockfile machinery.
So what might be happening, assuming the remote git installs a Ctrl-C handler is:
- Nextest and Git both receive the Ctrl-C.
- The remote git starts running its Ctrl-C handler.
- But meanwhile, nextest immediately calls
TerminateJobObjecton the test and all subprocesses, which means the git process gets killed before the signal handler has a chance to complete.
I found this code in git: https://github.com/git/git/blob/2ccc89b0c16c51561da90d21cfbb4b58cc877bf6/compat/mingw.c#L681
Unfortunately, Git is not really a native Windows application, so internally Ctrl-C gets translated via MinGW into SIGINT. This translation layer might be acting up as well.
~~What I'm going to do is -- I'm going to put up a PR to change nextest, such that it waits for the grace period in case of Ctrl-C. Would love it if you gave that PR a shot!~~ See comment below.
(Like most Rust devtools, nextest is a full-fledged Windows application. There is no attempt to emulate Unix on Windows -- many of the same types are used, but with #[cfg(unix)] and #[cfg(windows)] enum variants as appropriate.)
Oh no, I was wrong. In reality, on receiving Ctrl-C, nextest doesn't do anything at all. It just lets tests exit by themselves. See:
https://github.com/nextest-rs/nextest/blob/208d6e527098cf347f731975f3606249ee1cc791/nextest-runner/src/runner/windows.rs#L86-L90
I actually think this might be wrong -- nextest should probably wait for the grace period and then call TerminateJobObject. But that wouldn't impact this issue.
My guess right now is that this is a git/mingw issue. But in that case, I'm not quite sure why that code would be reflected in the exit status of the test process.
It might be worth installing a Ctrl-C handler in these tests, that just waits a few seconds before terminating or something.
edit: Ctrl-C grace periods are implemented in https://github.com/nextest-rs/nextest/commit/66f4a994acbca6724eff9cf4dd7ac6ac1c711ae1, which is part of nextest main. It wouldn't have any impact on this issue.
It might be worth installing a Ctrl-C handler in these tests, that just waits a few seconds before terminating or something.
Even though I don't know why exactly it happens, I don't think a systemic issue should be patched up on the side of the tests.
I am willing to live with it, knowing that the gix (CLI) program doesn't exhibit this behaviour, despite or because of installing signal handlers. It might be interesting to see what happens if these aren't installed - would it fail just like the nextest runs do while cloning a local resource, or is there yet another difference? If it works better, we might have an indication that nextest could do something differently to fix this, and if it's the same we would know it's just the way of the world.
Even though I don't know why exactly it happens, I don't think a systemic issue should be patched up on the side of the tests.
Ah I meant just for debugging. Agreed that that wouldn't be the right long term fix.