events-executor segfault on fast-dds `RMWSubscriptionEvent::update_data_available()`
Bug report
Required Info:
- Operating System:
- ubuntu 22.04
- Installation type:
- binaries
- Version or commit hash:
- ROS 2 iron 0.10.0-3
- DDS implementation:
- Fast-DDS
- Client library (if applicable):
- rclcpp
Steps to reproduce issue
This issue happens as part of a larger application, so I don't have a minimal reproducible example. I have 2 nodes in the same process. Each node has its own events-executor, spinning in a dedicated thread.
Node A periodically publishes an occupancy grid using this publisher
rclcpp::PublisherOptions pub_options;
pub_options.use_intra_process_comm = rclcpp::IntraProcessSetting::Disable;
m_map_pub = parent_node->create_publisher<nav_msgs::msg::OccupancyGrid>(
map_topic_name,
rclcpp::QoS(rclcpp::KeepLast(1)).transient_local().reliable(),
pub_options);
Node B from time to time creates a subscription to that topic, waits for a message and then lets the subscription go out of scope.
nav_msgs::msg::OccupancyGrid::ConstSharedPtr
get_occupancy_grid(
const std::string & topic,
std::chrono::seconds timeout)
{
nav_msgs::msg::OccupancyGrid::ConstSharedPtr map;
rclcpp::SubscriptionOptions sub_options;
sub_options.use_intra_process_comm = rclcpp::IntraProcessSetting::Disable;
auto map_sub = node->create_subscription<nav_msgs::msg::OccupancyGrid>(
topic,
rclcpp::QoS(rclcpp::KeepLast(1)).transient_local().reliable(),
[&map](nav_msgs::msg::OccupancyGrid::ConstSharedPtr msg) {
map = msg;
},
sub_options);
auto start_time = std::chrono::steady_clock::now();
while ((std::chrono::steady_clock::now() - start_time < timeout) && rclcpp::ok()) {
if (map) {break;}
std::this_thread::sleep_for(std::chrono::milliseconds(50));
}
return map;
}
Actual behavior
I have a unit-test that uses this setup, where the get_occupancy_grid function is called 4-5 times within the test duration.
The test fails sporadically, 1/30 times maybe, with the following backtrace
Thread 27 "test_positioner" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffc3ffe100 (LWP 23807)]
0x0000fffff7c65900 in ?? () from /opt/ros/iron/lib/librclcpp.so
(gdb) bt
#0 0x0000fffff7c65900 in ?? () from /opt/ros/iron/lib/librclcpp.so
#1 0x0000fffff7c62468 in ?? () from /opt/ros/iron/lib/librclcpp.so
#2 0x0000fffff7c661e4 in void rclcpp::detail::cpp_callback_trampoline<std::function<void (unsigned long)>, void const*, unsigned long, void>(void const*, unsigned long) () from /opt/ros/iron/lib/librclcpp.so
#3 0x0000fffff1d62a74 in RMWSubscriptionEvent::update_data_available() ()
from /opt/ros/iron/lib/librmw_fastrtps_shared_cpp.so
#4 0x0000fffff1757428 in eprosima::fastdds::dds::DataReaderImpl::InnerDataReaderListener::on_data_available(eprosima::fastrtps::rtps::RTPSReader*, eprosima::fastrtps::rtps::GUID_t const&, eprosima::fastrtps::rtps::SequenceNumber_t const&, eprosima::fastrtps::rtps::SequenceNumber_t const&, bool&) ()
from /opt/ros/iron/lib/libfastrtps.so.2.10
#5 0x0000fffff16da1b8 in eprosima::fastrtps::rtps::StatefulReader::NotifyChanges(eprosima::fastrtps::rtps::WriterProxy*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#6 0x0000fffff16dcd30 in eprosima::fastrtps::rtps::StatefulReader::change_received(eprosima::fastrtps::rtps::CacheChange_t*, eprosima::fastrtps::rtps::WriterProxy*, unsigned long) ()
from /opt/ros/iron/lib/libfastrtps.so.2.10
#7 0x0000fffff16dd28c in eprosima::fastrtps::rtps::StatefulReader::processDataMsg(eprosima::fastrtps::rtps::CacheChange_t*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#8 0x0000fffff16b6f4c in eprosima::fastrtps::rtps::StatefulWriter::deliver_sample_to_intraprocesses(eprosima::fastrtps::rtps::CacheChange_t*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#9 0x0000fffff16b7e5c in eprosima::fastrtps::rtps::StatefulWriter::deliver_sample_nts(eprosima::fastrtps::rtps::CacheChange_t*, eprosima::fastrtps::rtps::RTPSMessageGroup&, eprosima::fastrtps::rtps::LocatorSelectorSender&, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#10 0x0000fffff188c204 in ?? () from /opt/ros/iron/lib/libfastrtps.so.2.10
#11 0x0000fffff16b08d4 in eprosima::fastrtps::rtps::StatefulWriter::unsent_change_added_to_history(eprosima::fastrtps::rtps::CacheChange_t*, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#12 0x0000fffff1735d8c in eprosima::fastdds::dds::DataWriterImpl::perform_create_new_change(eprosima::fastrtps::rtps::ChangeKind_t, void*, eprosima::fastrtps::rtps::WriteParams&, eprosima::fastrtps::rtps::InstanceHandle_t const&) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#13 0x0000fffff1736694 in eprosima::fastdds::dds::DataWriterImpl::create_new_change_with_params(eprosima::fastrtps::rtps::ChangeKind_t, void*, eprosima::fastrtps::rtps::WriteParams&) ()
from /opt/ros/iron/lib/libfastrtps.so.2.10
#14 0x0000fffff1736760 in eprosima::fastdds::dds::DataWriterImpl::create_new_change(eprosima::fastrtps::rtps::ChangeKind_t, void*) () from /opt/ros/iron/lib/libfastrtps.so.2.10
#15 0x0000fffff17283ac in eprosima::fastdds::dds::DataWriterImpl::write(void*) ()
from /opt/ros/iron/lib/libfastrtps.so.2.10
#16 0x0000fffff1d6c5d8 in rmw_fastrtps_shared_cpp::__rmw_publish(char const*, rmw_publisher_s const*, void const*, rmw_publisher_allocation_s*) () from /opt/ros/iron/lib/librmw_fastrtps_shared_cpp.so
#17 0x0000fffff7e0806c in ?? () from /opt/ros/iron/lib/librcl.so
#18 0x0000ffffe45e2260 in rclcpp::Publisher<nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, std::allocator<void> >::do_inter_process_publish (msg=..., this=0xaaaaab63dfe0)
at /usr/include/c++/11/bits/shared_ptr_base.h:1295
#19 rclcpp::Publisher<nav_msgs::msg::OccupancyGrid_<std::allocator<void> >, std::allocator<void> >::publish<nav_msgs::msg::OccupancyGrid_<std::allocator<void> > > (msg=..., this=0xaaaaab63dfe0)
at /opt/ros/iron/include/rclcpp/rclcpp/publisher.hpp:273
Additional information
I think that the problem might be a race condition between the subscription going out of scope while the publisher is trying to push an event to it