rclcpp
rclcpp copied to clipboard
Segment fault occurred when node was added to executor if the callback group variable created was local
ROS2
Bug report
Required Info:
- Operating System:
- Ubuntu 22.04
- Installation type:
- from source
- Version or commit hash:
- humble release patch 5
- DDS implementation:
- Fast-RTPS
- Client library (if applicable):
- rclcpp
Steps to reproduce issue
Create a node and call the create_callback_group interface to create a callback group local variable, then add the node to the executor and spin the executor, segment fault occuered sometimes.
#include <memory>
#include <thread>
#include <string>
#include <vector>
#include "rclcpp/executors.hpp"
#include "rclcpp/node.hpp"
#include "rclcpp/timer.hpp"
#include "std_msgs/msg/string.hpp"
int main(int argc, char **argv) {
rclcpp::init(argc, argv);
auto node = std::make_shared<rclcpp::Node>("test_callback_group");
auto sub = node->create_subscription<std_msgs::msg::String>(
"/test_callback_group", 10,
[](const std_msgs::msg::String::ConstSharedPtr) {});
rclcpp::executors::MultiThreadedExecutor executor(rclcpp::ExecutorOptions(),
3u);
std::thread thr;
{
auto group = node->create_callback_group(
rclcpp::CallbackGroupType::MutuallyExclusive);
executor.add_node(node);
thr = std::thread([&executor]() { executor.spin(); });
std::cout << "wait for executor running..." << std::endl;
} // executor wait on dds and then destory the callback group
while (rclcpp::ok()) {
std::this_thread::sleep_for(std::chrono::seconds(1));
}
rclcpp::shutdown();
if (thr.joinable()) {
thr.join();
}
rclcpp::shutdown();
return 0;
}
#### Expected behavior
process runs normally
#### Actual behavior
segment fault
#### Additional information
It seems to be related to the lifetime of the condition in the callback group.
Hello @Chaochao0215, maybe we could build a debug type of this source code and use gdb to backtrace when it crashed
I tried your code, but I cannot find segment fault.
My test environment is based on ros2 Debian package (ros-humble-rclcpp 16.0.8-1jammy.20240217.065743).
Hello @Chaochao0215, maybe we could build a debug type of this source code and use
gdbto backtrace when it crashed Glad to receive your reply, this issue reoccuered and i pasted the backtrace information
I tried your code and I can't reproduce the segment fault. However, in gdb backtrace it seems like a multi-thread problem, rebuild your program with
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=thread -g")
and check if there's data race in ThreadSanitizer report.
I tried your code and I can't reproduce the segment fault. However, in gdb backtrace it seems like a multi-thread problem, rebuild your program with
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=thread -g")and check if there's data race in
ThreadSanitizerreport.
hi, I found a quick way to reproduce the problem
- set breakpoint 1 at line 21 of the test file
- set breakpoint 2 at line 25 of the test file
- set breakpoint 3 at " rcl_ret_t status = rcl_wait(&wait_set_, std::chrono::duration_caststd::chrono::nanoseconds(timeout).count());" in executor.cpp
- set gdb to not stop when hit breakpoints , then run the test
- after breakpoints 1 && 3 both hit , continue the main thread and wait to hit breakpoint 2
- after breakpoint 2 hit, send the message subscribed in the new shell terminal, such as 'ros2 topic pub /test_callback_group std_msgs/msg/String "data: hello"' and continue the thread interrupted at breakpoint 3 , segment fault should occurred
6. send the message subscribed in the new shell terminal, such as 'ros2 topic pub /test_callback_group std_msgs/msg/String "data: hello"
I missed this step.
- send the message subscribed in the new shell terminal, such as 'ros2 topic pub /test_callback_group std_msgs/msg/String "data: hello"
I missed this step.
I'm sorry i didn't specify the need to publish message at the beginning, hope you reproduce it successfully.
I tried your code, but I cannot find segment fault. My test environment is based on ros2 Debian package (ros-humble-rclcpp 16.0.8-1jammy.20240217.065743). Thank you for your attention, it occurs with low probability and I built ROS2 from source so that it can be controlled through gdb
@Chaochao0215 it would be helpful if you could provide self-contained reproducible test.
@Chaochao0215 Did you find the cause of this problem in the end? How did you solve it? I also encountered the same problem when using MultiThreadedExecutor, The program stack is as follows
Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000007f874cdac4 in ?? () from /opt/ros/humble/lib/librmw_cyclonedds_cpp.so [Current thread is 1 (Thread 0x7f5f7ee8e0 (LWP 402))] (gdb) bt #0 0x0000007f874cdac4 in ?? () from /opt/ros/humble/lib/librmw_cyclonedds_cpp.so #1 0x0000007f874cfd50 in rmw_wait () from /opt/ros/humble/lib/librmw_cyclonedds_cpp.so #2 0x0000007f87f16c38 in rcl_wait () from /opt/ros/humble/lib/librcl.so #3 0x0000007f88266850 in rclcpp::Executor::wait_for_work(std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /opt/ros/humble/lib/librclcpp.so #4 0x0000007f8826989c in rclcpp::Executor::get_next_executable(rclcpp::AnyExecutable&, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) () from /opt/ros/humble/lib/librclcpp.so #5 0x0000007f88270d38 in rclcpp::executors::MultiThreadedExecutor::run(unsigned long) () from /opt/ros/humble/lib/librclcpp.so #6 0x0000007f87d531fc in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6 #7 0x0000007f87b1d5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442 #8 0x0000007f87b85d9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79
@Chaochao0215 Did you find the cause of this problem in the end? How did you solve it? I also encountered the same problem when using MultiThreadedExecutor, The program stack is as follows
Program terminated with signal SIGSEGV, Segmentation fault.
could you give a minimum demo code to regenerate this error ? I tried codes above, but it didn't crash.
@Chaochao0215 Did you find the cause of this problem in the end? How did you solve it? I also encountered the same problem when using MultiThreadedExecutor, The program stack is as follows Program terminated with signal SIGSEGV, Segmentation fault.
could you give a minimum demo code to regenerate this error ? I tried codes above, but it didn't crash.
My code is mixed with a lot of logic, I don't know which call caused it, this crash is quite sporadic, The program only appears once every few days.
This issue looks similar to https://github.com/ros2/rmw_fastrtps/issues/777.