rosbag2
rosbag2 copied to clipboard
👩🌾 PlaySrvsTest.set_rate_good_values flaky timeout
Description
The set_rate_good_values
test has been flaky on the nightly jobs, see history for test_play_services__rmw_fastrtps_cpp
:
https://ci.ros2.org/job/nightly_linux_repeated/lastSuccessfulBuild/testReport/(root)/projectroot/test_play_services__rmw_fastrtps_cpp/history/
Expected Behavior
The test passes consistently.
Actual Behavior
The test is flaky, failing with a timeout after 60 s:
08:29:58 31: Test timeout computed to be: 60
08:29:58 31: -- run_test.py: extra environment variables:
08:29:58 31: - RMW_IMPLEMENTATION=rmw_fastrtps_cpp
08:29:58 31: -- run_test.py: invoking following command in '/home/jenkins-agent/workspace/nightly_linux_repeated/ws/build/rosbag2_transport':
08:29:58 31: - /home/jenkins-agent/workspace/nightly_linux_repeated/ws/build/rosbag2_transport/test_play_services__rmw_fastrtps_cpp --gtest_output=xml:/home/jenkins-agent/workspace/nightly_linux_repeated/ws/build/rosbag2_transport/test_results/rosbag2_transport/test_play_services__rmw_fastrtps_cpp.gtest.xml
08:29:58 31: Running main() from gmock_main.cc
08:29:58 31: [==========] Running 5 tests from 1 test suite.
08:29:58 31: [----------] Global test environment set-up.
08:29:58 31: [----------] 5 tests from PlaySrvsTest
08:29:58 31: [ RUN ] PlaySrvsTest.pause_resume
08:29:59 31: [ OK ] PlaySrvsTest.pause_resume (566 ms)
08:29:59 31: [ RUN ] PlaySrvsTest.toggle_paused
08:29:59 31: [ OK ] PlaySrvsTest.toggle_paused (499 ms)
08:29:59 31: [ RUN ] PlaySrvsTest.set_rate_good_values
08:30:58 Test #31: test_play_services__rmw_fastrtps_cpp ...............***Timeout 60.01 sec
To Reproduce
Run nightly CI.
System (please complete the following information)
- OS: Ubuntu Focal
- ROS 2 Distro: Rolling
- Version:
master
Additional context
That test makes 4 successful_call
calls:
https://github.com/ros2/rosbag2/blob/44e6422f40d424ebf3e741af058192e0b93b925a/rosbag2_transport/test/rosbag2_transport/test_play_services.cpp#L279-L297
Each should have a 1s timeout:
https://github.com/ros2/rosbag2/blob/44e6422f40d424ebf3e741af058192e0b93b925a/rosbag2_transport/test/rosbag2_transport/test_play_services.cpp#L205
https://github.com/osrf/buildfarmer/issues/224
I'm trying reproduct the bug in local, but I failed, is there any way to produce it in local?
Assigning this to myself and picking up to investigate. Note @llapx these can be hard to reproduce - you can try to use the settings from the repeated
build above
test_args: --event-handlers console_direct+ --executor sequential --retest-until-fail 2 --ctest-args -LE "(linter|xfail)" --pytest-args -m "not linter and not xfail"
Additionally I often try to run the gtests directly with repeat, e.g.
source install/setup.bash
cd build/rosbag2_transport
RMW_IMPLEMENTATION=rmw_fastrtps_cpp ./test_play_services__rmw_fastrtps_cpp --gtest_repeat=-1 --gtest_break_on_failure
That will run the test infinitely until it fails. While this is happening you can try to, perhaps, stress the CPU by using stress
utility to see if it only happens in low-resource scenarios, for example.
Hopefully that helps, I am hoping that will reproduce the problem for me locally - otherwise I'll have to think of a next step.
- Closing as duplicate for the #862 since there more context and preliminary analysis of the issue.