rcl icon indicating copy to clipboard operation
rcl copied to clipboard

:farmer: Flaky test `test_graph__rmw_fastrtps_cpp` on ros2 buildfarm

Open Crola1702 opened this issue 1 year ago • 1 comments

Bug report

  • Operating System:
    • Ubuntu 20.04 and 22.04
  • Installation type:
    • Source
  • Version or commit hash:
    • Rolling
  • DDS implementation:
    • Fast-RTPS

Steps to reproduce issue

  1. Run a build in one of the following jobs
    • Nightlies repeated jobs (windows, arch64, linux, rhel)
    • Humble coverage or debug
    • Linux aarch64 Debug
  2. See it fail (if lucky)

Expected behavior

Not failing

Actual behavior

This test is failing on repeated jobs since the last year (as they retest everything until fail, it is expected that this test fails there), However, I find it weird that is happening on other jobs different to repeated ones, because when the test fails there, it's rerun (opposite to repeated jobs)

This test has a 21% flaky ratio on Humble Coverage (7/33 builds checked, however just 1 of them is marked as unstable with this test failure).

Additional information

Reference build: https://ci.ros2.org/job/nightly_linux_humble_coverage/187/

Test regression: rcl_action.test_graph__rmw_fastrtps_cpp.gtest.missing_result

Log output:

Log output:

      Start 10: test_graph__rmw_fastrtps_cpp

10: Test command: /home/jenkins-agent/workspace/nightly_linux_humble_coverage/venv/bin/python3.10 "-u" "/home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/install/ament_cmake_test/share/ament_cmake_test/cmake/run_test.py" "/home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/build/rcl_action/test_results/rcl_action/test_graph__rmw_fastrtps_cpp.gtest.xml" "--package-name" "rcl_action" "--output-file" "/home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/build/rcl_action/ament_cmake_gtest/test_graph__rmw_fastrtps_cpp.txt" "--env" "RCL_ASSERT_RMW_ID_MATCHES=rmw_fastrtps_cpp" "RMW_IMPLEMENTATION=rmw_fastrtps_cpp" "--command" "/home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/build/rcl_action/test_graph__rmw_fastrtps_cpp" "--gtest_output=xml:/home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/build/rcl_action/test_results/rcl_action/test_graph__rmw_fastrtps_cpp.gtest.xml"
10: Test timeout computed to be: 180
10: -- run_test.py: extra environment variables:
10:  - RCL_ASSERT_RMW_ID_MATCHES=rmw_fastrtps_cpp
10:  - RMW_IMPLEMENTATION=rmw_fastrtps_cpp
10: -- run_test.py: invoking following command in '/home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/build/rcl_action':
10:  - /home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/build/rcl_action/test_graph__rmw_fastrtps_cpp --gtest_output=xml:/home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/build/rcl_action/test_results/rcl_action/test_graph__rmw_fastrtps_cpp.gtest.xml
10: Running main() from /home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/install/gtest_vendor/src/gtest_vendor/src/gtest_main.cc
10: [==========] Running 10 tests from 2 test suites.
10: [----------] Global test environment set-up.
10: [----------] 3 tests from TestActionGraphFixture__rmw_fastrtps_cpp
10: [ RUN      ] TestActionGraphFixture__rmw_fastrtps_cpp.test_action_get_client_names_and_types_by_node
10: -- run_test.py: return code -11
10: -- run_test.py: generate result file '/home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/build/rcl_action/test_results/rcl_action/test_graph__rmw_fastrtps_cpp.gtest.xml' with failed test
10: -- run_test.py: verify result file '/home/jenkins-agent/workspace/nightly_linux_humble_coverage/ws/build/rcl_action/test_results/rcl_action/test_graph__rmw_fastrtps_cpp.gtest.xml'
10/22 Test #10: test_graph__rmw_fastrtps_cpp ....................***Failed    0.31 sec

First time happening: Nightly Linux Repeated 2361 (1 year ago)

First 20 builds with this test regression in ros2 nightlies:

image

Last 20 builds with this test regression:

image

Jobs different from repeated

image

Crola1702 avatar Oct 27 '22 20:10 Crola1702