bond_core
bond_core copied to clipboard
memory safe subscription callback
Manage the lifetime of the bondStatusCB member function by wrapping it in a lambda that captures a weak_ptr (via public inheritance of std::enable_shared_from_this)
NOTE: This is a draft to show how it could be done.
It solves a use after free that is being experienced using intraprocess communication in nav2 https://github.com/ros-navigation/navigation2/issues/4691 .
Also being discussed as a bug in rclcpp - https://github.com/ros2/rclcpp/issues/2678#issuecomment-2583472585
Something like createSafeSubscriptionMemFuncCallback could make its way into the rclcpp namespace.
Note that beyond the linting problems that the job points out, there's an actual set of test failures in test_callbacks_cpp which probably point to a bug in the implementation / a test that needs a minor update:
02:07:59 1: Test command: /usr/bin/python3 "-u" "/opt/ros/rolling/share/ament_cmake_test/cmake/run_test.py" "/tmp/ws/test_results/test_bond/test_callbacks_cpp.gtest.xml" "--package-name" "test_bond" "--output-file" "/tmp/ws/build_isolated/test_bond/ament_cmake_gtest/test_callbacks_cpp.txt" "--command" "/tmp/ws/build_isolated/test_bond/test_callbacks_cpp" "--gtest_output=xml:/tmp/ws/test_results/test_bond/test_callbacks_cpp.gtest.xml"
02:07:59 1: Working Directory: /tmp/ws/build_isolated/test_bond
02:07:59 1: Test timeout computed to be: 60
02:07:59 1: -- run_test.py: invoking following command in '/tmp/ws/build_isolated/test_bond':
02:07:59 1: - /tmp/ws/build_isolated/test_bond/test_callbacks_cpp --gtest_output=xml:/tmp/ws/test_results/test_bond/test_callbacks_cpp.gtest.xml
02:07:59 1: Running main() from /opt/ros/rolling/src/gtest_vendor/src/gtest_main.cc
02:07:59 1: [==========] Running 2 tests from 1 test suite.
02:07:59 1: [----------] Global test environment set-up.
02:07:59 1: [----------] 2 tests from TestCallbacksCpp
02:07:59 1: [ RUN ] TestCallbacksCpp.dieInLifeCallback
02:07:59 1: unknown file: Failure
02:07:59 1: C++ exception with description "bad_weak_ptr" thrown in the test body.
02:07:59 1:
02:07:59 1: [ FAILED ] TestCallbacksCpp.dieInLifeCallback (14 ms)
02:07:59 1: [ RUN ] TestCallbacksCpp.remoteNeverConnects
02:07:59 1: unknown file: Failure
02:07:59 1: C++ exception with description "bad_weak_ptr" thrown in the test body.
02:07:59 1:
02:07:59 1: [ FAILED ] TestCallbacksCpp.remoteNeverConnects (5 ms)
02:07:59 1: [----------] 2 tests from TestCallbacksCpp (20 ms total)
02:07:59 1:
02:07:59 1: [----------] Global test environment tear-down
02:07:59 1: [==========] 2 tests from 1 test suite ran. (27 ms total)
02:07:59 1: [ PASSED ] 0 tests.
02:07:59 1: [ FAILED ] 2 tests, listed below:
02:07:59 1: [ FAILED ] TestCallbacksCpp.dieInLifeCallback
02:07:59 1: [ FAILED ] TestCallbacksCpp.remoteNeverConnects
02:07:59 1:
02:07:59 1: 2 FAILED TESTS
02:07:59 1: -- run_test.py: return code 1
02:07:59 1: -- run_test.py: inject classname prefix into gtest result file '/tmp/ws/test_results/test_bond/test_callbacks_cpp.gtest.xml'
02:07:59 1: -- run_test.py: verify result file '/tmp/ws/test_results/test_bond/test_callbacks_cpp.gtest.xml'
02:07:59 1/7 Test #1: test_callbacks_cpp ...............***Failed 0.21 sec
@SteveMacenski fixed the lint and test. This change requires the Bond to be created as a shared_ptr.
Just a note that these changes are ABI-breaking (adding members and changing inheritance), so definitely need to land in rolling.
I know that the aim is really to fix rclcpp, not this - so I think this PR is mostly used for an experimental reproduction ground and testing possible application-space solutions for the rclcpp IPC issue.
@ewak : Is this intended to be a candidate for merging for either a short term solution to get Nav2 turning over with IPC and/or should be reviewed by maintainers for inclusion?
Maintainers: This requires creating the object as a shared pointer, similar to rclcpp nodes themselves. Would this be acceptable to maintainers to put that restriction for getting IPC running on nodes that use bond?
I'd like to bridge the gap a bit to figure out next steps here, if there are some outside of rclcpp
Hi @ewak, will the PR be merged in the near future? I found the similar issue while running nav2 test with rmw_zenoh on ROS 2 rolling. I came up with a similar fix to what you have in this PR. Not sure if this is the final solution or not.
Today is the feature freeze for Kilted, I will merge this PR because it fixed the problem and we can work on proper fix