rmw_fastrtps Segmentation fault when adding a short-lived node to an executor

Segmentation fault when adding a short-lived node to an executor

Open adamlm opened this issue 5 months ago • 6 comments

Bug report

Description

I am trying to set up an executor "spinner thread" that will spin a specified rclcpp::Executor until commanded to stop. My goal is to construct the executor and spinner thread in one scope then add nodes to that executor in another scope. However, the rclcpp::Executor classes cause a segmentation fault under certain conditions. There appears to be a race condition in the RMW layer that manifests with short-lived nodes.

Operating System: Ubuntu 22.04 (Docker container); Ubuntu 22.04 host
Installation type: Binaries
Version or commit hash:
- ros-humble-rclcpp: 16.0.10
- ros-humble-rmw-fastrtps-cpp: 6.2.7
- ros-humble-fastrtps: 2.6.8
- ros-humble-rmw-cyclonedds-cpp: 0.10.4
- ros-humble-cyclonedds: 1.3.4
DDS implementation:
- Fast-DDS (Fast-RTPS)
- Cyclone DDS
Client library (if applicable): rclcpp
Compiler:
- Clang 14
- GCC 11.4.0

Steps to reproduce issue

Minimal reproducible example

#include <atomic>
#include <memory>
#include <thread>

#include <rclcpp/executors/multi_threaded_executor.hpp>
#include <rclcpp/node.hpp>

auto function() -> void
{
    rclcpp::init(0, nullptr);

    rclcpp::executors::MultiThreadedExecutor executor;

    std::atomic_bool stop_spinning{false};
    auto spinner_thread{std::thread([&] {
        while (rclcpp::ok() && !stop_spinning.load()) {
            executor.spin_some();
        }
    })};

    {
        auto node{std::make_shared<rclcpp::Node>("node")};
        executor.add_node(node);
    }

    stop_spinning.store(true);
    spinner_thread.join();

    rclcpp::shutdown();
}

auto main() -> int
{
    while (true) {
        function();
    }

    return 0;
}

Steps

Compile and run the above example.
Wait for a segmentation fault error. It usually happens within a minute.

Expected behavior

The program runs indefinitely.

Actual behavior

The program exits due to a segmentation fault.

Program output when compiled with Clang

Segmentation fault (core dumped)

Program output when compiled with GCC

Segmentation fault (core dumped)

In (only) one of the many runs I did, I got the following output (when compiled with GCC):

terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
  what():  failed to add guard condition to wait set: guard condition implementation is invalid, at ./src/rcl/guard_condition.c:172, at ./src/rcl/wait.c:460
Aborted (core dumped)

Additional information

Relevant core dump

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f4154232e3f in rmw_fastrtps_shared_cpp::__rmw_wait(char const*, rmw_subscriptions_s*, rmw_guard_conditions_s*, rmw_services_s*, rmw_clients_s*, rmw_events_s*, rmw_wait_set_s*, rmw_time_s const*) ()
   from /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
[Current thread is 1 (Thread 0x7f41531e8640 (LWP 1202848))]
(gdb) bt
#0  0x00007f4154232e3f in rmw_fastrtps_shared_cpp::__rmw_wait(char const*, rmw_subscriptions_s*, rmw_guard_conditions_s*, rmw_services_s*, rmw_clients_s*, rmw_events_s*, rmw_wait_set_s*, rmw_time_s const*) ()
    at /opt/ros/humble/lib/librmw_fastrtps_shared_cpp.so
#1  0x00007f415428a887 in rmw_wait(rmw_subscriptions_t*, rmw_guard_conditions_t*, rmw_services_t*, rmw_clients_t*, rmw_events_t*, rmw_wait_set_t*, rmw_time_t const*)
    (subscriptions=<optimized out>, guard_conditions=<optimized out>, services=<optimized out>, clients=<optimized out>, events=<optimized out>, wait_set=<optimized out>, wait_timeout=0x7f41531d5510) at ./src/rmw_wait.cpp:33
#2  0x00007f42759b1848 in rcl_wait () at /opt/ros/humble/lib/librcl.so
#3  0x00007f4274f2470c in rclcpp::Executor::wait_for_work(std::chrono::duration<long, std::ratio<1l, 1000000000l> >) (this=0x7ffdbe823c50, timeout=...) at /usr/include/c++/11/chrono:521
#4  0x00007f4274f28853 in rclcpp::Executor::spin_some_impl(std::chrono::duration<long, std::ratio<1l, 1000000000l> >, bool) (this=0x7ffdbe823c50, max_duration=..., exhaustive=false) at /usr/include/c++/11/chrono:537
#5  0x000063ca4387dbde in function()::$_0::operator()() const (this=0x63ca44fc84e8) at <redacted_executable_path>:17
#6  0x000063ca4387db1d in std::__invoke_impl<void, function()::$_0>(std::__invoke_other, function()::$_0&&) (__f=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
#7  0x000063ca4387daad in std::__invoke<function()::$_0>(function()::$_0&&) (__fn=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96
#8  0x000063ca4387da85 in std::thread::_Invoker<std::tuple<function()::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x63ca44fc84e8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:259
#9  0x000063ca4387da55 in std::thread::_Invoker<std::tuple<function()::$_0> >::operator()() (this=0x63ca44fc84e8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:266
#10 0x000063ca4387d979 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<function()::$_0> > >::_M_run() (this=0x63ca44fc84e0) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_thread.h:211
#11 0x00007f4273705253 in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#12 0x00007f427338dac3 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6
#13 0x00007f427341f850 in  () at /usr/lib/x86_64-linux-gnu/libc.so.6

Other notes

The issue occurs when using either the SingleThreadedExecutor or MultiThreadedExecutor.

No segmentation fault occurs* when relocating node to the outer scope.

auto node{std::make_shared<rclcpp::Node>("node")};

{
    executor.add_node(node);
}

No segmentation fault occurs* when using Cyclone DDS

No segmentation fault occurs* when sleeping the main thread for a short duration before adding node to the executor.

{
    auto node{std::make_shared<rclcpp::Node>("node")};
    std::this_thread::sleep_for(std::chrono::seconds(1));
    executor.add_node(node);
}

* I concluded a segmentation fault was unlikely after running the program for several minutes.

Sep 08 '24 04:09 adamlm

rmw_fastrtps rmw_fastrtps copied to clipboard

Segmentation fault when adding a short-lived node to an executor

Bug report

Description

Steps to reproduce issue

Minimal reproducible example

Steps

Expected behavior

Actual behavior

Additional information

Relevant core dump

Other notes

rmw_fastrtps
rmw_fastrtps copied to clipboard