rmw_fastrtps
rmw_fastrtps copied to clipboard
Segmentation fault when adding a short-lived node to an executor
Bug report
Description
I am trying to set up an executor "spinner thread" that will spin a specified rclcpp::Executor
until commanded to stop. My goal is to construct the executor and spinner thread in one scope then add nodes to that executor in another scope. However, the rclcpp::Executor
classes cause a segmentation fault under certain conditions. There appears to be a race condition in the RMW layer that manifests with short-lived nodes.
-
Operating System: Ubuntu 22.04 (Docker container); Ubuntu 22.04 host
-
Installation type: Binaries
-
Version or commit hash:
-
ros-humble-rclcpp: 16.0.10
-
ros-humble-rmw-fastrtps-cpp: 6.2.7
-
ros-humble-fastrtps: 2.6.8
-
ros-humble-rmw-cyclonedds-cpp: 0.10.4
-
ros-humble-cyclonedds: 1.3.4
-
-
DDS implementation:
- Fast-DDS (Fast-RTPS)
- Cyclone DDS
-
Client library (if applicable):
rclcpp
-
Compiler:
- Clang 14
- GCC 11.4.0
Steps to reproduce issue
Minimal reproducible example
#include <atomic>
#include <memory>
#include <thread>
#include <rclcpp/executors/multi_threaded_executor.hpp>
#include <rclcpp/node.hpp>
auto function() -> void
{
rclcpp::init(0, nullptr);
rclcpp::executors::MultiThreadedExecutor executor;
std::atomic_bool stop_spinning{false};
auto spinner_thread{std::thread([&] {
while (rclcpp::ok() && !stop_spinning.load()) {
executor.spin_some();
}
})};
{
auto node{std::make_shared<rclcpp::Node>("node")};
executor.add_node(node);
}
stop_spinning.store(true);
spinner_thread.join();
rclcpp::shutdown();
}
auto main() -> int
{
while (true) {
function();
}
return 0;
}
Steps
- Compile and run the above example.
- Wait for a segmentation fault error. It usually happens within a minute.
Expected behavior
The program runs indefinitely.
Actual behavior
The program exits due to a segmentation fault.
Program output when compiled with Clang
Segmentation fault (core dumped)
Program output when compiled with GCC
Segmentation fault (core dumped)
In (only) one of the many runs I did, I got the following output (when compiled with GCC):
terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
what(): failed to add guard condition to wait set: guard condition implementation is invalid, at ./src/rcl/guard_condition.c:172, at ./src/rcl/wait.c:460
Aborted (core dumped)
Additional information
Relevant core dump
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f4154232e3f in rmw_fastrtps_shared_cpp::__rmw_wait(char const*, rmw_subscriptions_s*, rmw_guard_conditions_s*, rmw_services_s*, rmw_clients_s*, rmw_events_s*, rmw_wait_set_s*, rmw_time_s const*) ()
from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
[Current thread is 1 (Thread 0x7f41531e8640 (LWP 1202848))]
(gdb) bt
#0 0x00007f4154232e3f in rmw_fastrtps_shared_cpp::__rmw_wait(char const*, rmw_subscriptions_s*, rmw_guard_conditions_s*, rmw_services_s*, rmw_clients_s*, rmw_events_s*, rmw_wait_set_s*, rmw_time_s const*) ()
at /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#1 0x00007f415428a887 in rmw_wait(rmw_subscriptions_t*, rmw_guard_conditions_t*, rmw_services_t*, rmw_clients_t*, rmw_events_t*, rmw_wait_set_t*, rmw_time_t const*)
(subscriptions=<optimized out>, guard_conditions=<optimized out>, services=<optimized out>, clients=<optimized out>, events=<optimized out>, wait_set=<optimized out>, wait_timeout=0x7f41531d5510) at ./src/rmw_wait.cpp:33
#2 0x00007f42759b1848 in rcl_wait () at /opt/ros/humble/lib/librcl.so
#3 0x00007f4274f2470c in rclcpp::Executor::wait_for_work(std::chrono::duration<long, std::ratio<1l, 1000000000l> >) (this=0x7ffdbe823c50, timeout=...) at /usr/include/c++/11/chrono:521
#4 0x00007f4274f28853 in rclcpp::Executor::spin_some_impl(std::chrono::duration<long, std::ratio<1l, 1000000000l> >, bool) (this=0x7ffdbe823c50, max_duration=..., exhaustive=false) at /usr/include/c++/11/chrono:537
#5 0x000063ca4387dbde in function()::$_0::operator()() const (this=0x63ca44fc84e8) at <redacted_executable_path>:17
#6 0x000063ca4387db1d in std::__invoke_impl<void, function()::$_0>(std::__invoke_other, function()::$_0&&) (__f=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
#7 0x000063ca4387daad in std::__invoke<function()::$_0>(function()::$_0&&) (__fn=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96
#8 0x000063ca4387da85 in std::thread::_Invoker<std::tuple<function()::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x63ca44fc84e8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:259
#9 0x000063ca4387da55 in std::thread::_Invoker<std::tuple<function()::$_0> >::operator()() (this=0x63ca44fc84e8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:266
#10 0x000063ca4387d979 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<function()::$_0> > >::_M_run() (this=0x63ca44fc84e0) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211
#11 0x00007f4273705253 in () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007f427338dac3 in () at /usr/lib/x86_64-linux-gnu/libc.so.6
#13 0x00007f427341f850 in () at /usr/lib/x86_64-linux-gnu/libc.so.6
Other notes
-
The issue occurs when using either the
SingleThreadedExecutor
orMultiThreadedExecutor
. -
No segmentation fault occurs* when relocating
node
to the outer scope.auto node{std::make_shared<rclcpp::Node>("node")}; { executor.add_node(node); }
-
No segmentation fault occurs* when using Cyclone DDS
-
No segmentation fault occurs* when sleeping the main thread for a short duration before adding
node
to the executor.{ auto node{std::make_shared<rclcpp::Node>("node")}; std::this_thread::sleep_for(std::chrono::seconds(1)); executor.add_node(node); }
* I concluded a segmentation fault was unlikely after running the program for several minutes.