rclpy
rclpy copied to clipboard
:farmer: Humble TestClient.test_service_timestamps failing consistently in Connext
Bug report
Required Info:
- Operating System:
- Ubuntu Jammy
- Installation type:
- Source
- Version or commit hash:
- Humble
- DDS implementation:
- RTI Connext
- Client library (if applicable):
- rclpy
Steps to reproduce issue
- Run a build in Hci__nightly-connext_ubuntu_jammy_amd64
- See rclpy test regression appear
Additional information
Reference build:
- https://build.ros2.org/job/Hci__nightly-connext_ubuntu_jammy_amd64/266/
Test regressions:
Version of the first fail (Jul 26th build#258):
ros2/rclpy:
type: git
url: https://github.com/ros2/rclpy.git
version: 993b2b6d254a923503a049c14034cde07ebc750a
Log output:
../../src/ros2/rclpy/rclpy/test/test_client.py ..F... [100%]
=================================== FAILURES ===================================
______________________ TestClient.test_service_timestamps ______________________
self = <test.test_client.TestClient testMethod=test_service_timestamps>
@unittest.skipIf(
get_rmw_implementation_identifier() == 'rmw_connextdds' and platform.system() == 'Windows',
reason='Source timestamp not implemented for Connext on Windows')
def test_service_timestamps(self):
cli = self.node.create_client(GetParameters, 'get/parameters')
srv = self.node.create_service(
GetParameters, 'get/parameters',
lambda request, response: response)
try:
self.assertTrue(cli.wait_for_service(timeout_sec=20))
cli.call_async(GetParameters.Request())
for i in range(5):
with srv.handle:
result = srv.handle.service_take_request(srv.srv_type.Request)
if result is not None:
request, header = result
> self.assertTrue(header is not None)
E AssertionError: False is not true
../../src/ros2/rclpy/rclpy/test/test_client.py:110: AssertionError
This happened before with Connext https://github.com/ros2/rclpy/pull/818, but i cannot reproduce this problem on my local environment. what is the reproduction rate for this issue with nightly CI?
Sorry i did not use Connext DDS, with Connext i can now reproduce the issue with https://github.com/ros2/ros2/blob/humble/ros2.repos
srv.handle.service_take_request just takes the data from RMW implementation which is Connext in this case.
according the git history, nothing from rclpy, rmw_connextdds seems related to this behavior.
root@tomoyafujita:~/ros2_ws/humble_ws# dpkg -l rti-connext-dds-6.0.1
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=====================-============-============-=============================================
ii rti-connext-dds-6.0.1 6.0.1.25-1 amd64 real-time messaging and integration platform.
@trubio-rti any thoughts?
@fujitatomoya Hi, I've taken a look but I can't make much out of this issue. I'm unfortunately not working on the Connext RMW any more, sorry. I've created an internal issue and notified my colleague who's now working in the project.
@trubio-rti have you had any updates from your colleague? Can we link them to this issue?
Hi, I'm the new person working on Connext RMW. I will look into this as soon as I have some bandwith along with some more training.
I'm not able to reproduce the issue in my local environment in Humble using the Debian RTI Connext package. Can you provide me a reproducer? If you need info about what I'm doing, let me know
i can reproduce with humble source build. (but cannot make it happen with rolling)
note that, reproduction rate is not great, once out of ten in my experience.
- use https://github.com/ros2/ros2/blob/humble/ros2.repos
- source build procedure: https://docs.ros.org/en/humble/Installation/Alternatives/Ubuntu-Development-Setup.html and https://docs.ros.org/en/humble/Installation/DDS-Implementations.html#rti-connext-version-6-0-1-amd64-only
root@tomoyafujita:~/ros2_ws/humble_ws# colcon test --event-handlers console_direct+ --retest-until-fail 100 --packages-select rclpy --ctest-args -R test_client
...
7: ../../src/ros2/rclpy/rclpy/test/test_client.py ..F... [100%]
7:
7: =================================== FAILURES ===================================
7: ______________________ TestClient.test_service_timestamps ______________________
7:
7: self = <test.test_client.TestClient testMethod=test_service_timestamps>
7:
7: @unittest.skipIf(
7: get_rmw_implementation_identifier() == 'rmw_connextdds' and platform.system() == 'Windows',
7: reason='Source timestamp not implemented for Connext on Windows')
7: def test_service_timestamps(self):
7: cli = self.node.create_client(GetParameters, 'get/parameters')
7: srv = self.node.create_service(
7: GetParameters, 'get/parameters',
7: lambda request, response: response)
7: try:
7: self.assertTrue(cli.wait_for_service(timeout_sec=20))
7: cli.call_async(GetParameters.Request())
7: for i in range(5):
7: with srv.handle:
7: result = srv.handle.service_take_request(srv.srv_type.Request)
7: if result is not None:
7: request, header = result
7: > self.assertTrue(header is not None)
7: E AssertionError: False is not true
7:
7: ../../src/ros2/rclpy/rclpy/test/test_client.py:110: AssertionError
7: - generated xml file: /root/ros2_ws/humble_ws/build/rclpy/test_results/rclpy/test_client.xunit.xml -
7: =========================== short test summary info ============================
7: FAILED ../../src/ros2/rclpy/rclpy/test/test_client.py::TestClient::test_service_timestamps
7: ========================= 1 failed, 5 passed in 5.36s ==========================
7: -- run_test.py: return code 1
7: -- run_test.py: verify result file '/root/ros2_ws/humble_ws/build/rclpy/test_results/rclpy/test_client.xunit.xml'
Test #7: test_client ......................***Failed 5.97 sec
0% tests passed, 1 tests failed out of 1
Label Time Summary:
pytest = 5.97 sec*proc (1 test)
Total Test time (real) = 65.84 sec
The following tests FAILED:
7 - test_client (Failed)
Errors while running CTest
Output from these tests are in: /root/ros2_ws/humble_ws/build/rclpy/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
--- stderr: rclpy
Errors while running CTest
Output from these tests are in: /root/ros2_ws/humble_ws/build/rclpy/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
---
Finished <<< rclpy [1min 6s] [ with test failures ]
Summary: 1 package finished [1min 7s]
1 package had stderr output: rclpy
1 package had test failures: rclpy
I was able to reproduce it. Working on it
I think the issue is in the test itself. For instance, we are checking result is not None, but the function Service::service_take_request either returns the corresponding result, or (None, None) if the service take fails. So we are basically checking for a value that result is never going to have.
Unless I'm missing something, the fix would be as simple as changing if result is not None: for if result != (None, None): or something similar. With that change, the test passes consistently again. It's important to note that rolling would benefit from this change too, even though it doesn't seem to have the same issue.
Let me know what you think.
@fgallegosalido good eye, i believe that is correct! test is picking up the result from the failure of taking request data. i will check the patch with 100 loop for this test and make the PR for rclpy. as you suggested, we can have this fix into rolling and then backport all 👍