rclcpp
rclcpp copied to clipboard
:farmer: Flaky `TestGuardCondition.construction_and_destruction` throwing SEH Exception in Windows repeated
Bug report
Required Info:
- Operating System:
- Windows
- Installation type:
- Source
- Version or commit hash:
- Rollingo
- Client library (if applicable):
- rclcpp
Steps to reproduce issue
- Run a build in nightly_win_rep
- See
TestGuardCondition.construction_and_destructiontest failure
Expected behavior
Test should pass
Actual behavior
Test is failing
Additional information
Reference build: https://ci.ros2.org/view/nightly/job/nightly_win_rep/3273/#showFailuresLink
Test regressions:
Log output:
[ RUN ] TestGuardCondition.construction_and_destruction
unknown file: error: SEH exception with code 0xc0000005 thrown in the test body.
Stack trace:
[ FAILED ] TestGuardCondition.construction_and_destruction (0 ms)
Flaky ratio in the last 30 days (Apr 11th):
| job_name | last_fail | first_fail | build_count | failure_count | failure_percentage |
|---|---|---|---|---|---|
| nightly_win_rep | 2024-04-09 | 2024-03-13 | 29 | 11 | 37.93 |
Flaky ratio in the last 30 days (Jun 18th):
| job_name | last_fail | first_fail | build_count | failure_count | failure_percentage |
|---|---|---|---|---|---|
| nightly_win_rep | 2024-06-16 | 2024-05-21 | 25 | 8 | 32.0 |
The line that is failing is https://github.com/ros2/rclcpp/blob/rolling/rclcpp/test/rclcpp/test_guard_condition.cpp#L61-L62
auto mock = mocking_utils::inject_on_return(
"lib:rclcpp", rcl_guard_condition_fini, RCL_RET_ERROR);
Segfault is happening in this line inside the inject_on_return
mock_ = mmk_mock(target_c_str(), mock_type);
Thanks for isolating the area where the segfault is coming from!
is target_c_str() or mock_type null at this point?
target_c_str() has some value (lib:rclcpp) but I'm still trying to understand what's inside the mock_type
more details:
Code is failing here
mmk_vfprintf_ = (void*) plt_get_real_fn(ctx, vfprintf);
In particular plt_get_real_fn is calling CreateToolhelp32Snapshot() which is returning an INVALID_HANLDE_VALUE and the code abort.
The error in particular is The program issued a command but the command lenght is incorrect
I made this patch in plt-pe.c
HANDLE snap;
do {
snap = CreateToolhelp32Snapshot(TH32CS_NSAPMODULE, GetCurrentProcessId());
} while (snap == INVALID_HANDLE_VALUE);
Related fix https://github.com/ros2/Mimick/pull/38
Given what we merged yesterday, I'm going to consider that this one is "fixed", and close this out. If it turns out not to fix the issue, please feel free to reopen.