rclcpp icon indicating copy to clipboard operation
rclcpp copied to clipboard

Exceptions cannot be caught in the multi-threaded executor

Open tpreclik opened this issue 2 years ago • 5 comments
trafficstars

Bug report

Required Info:

  • Operating System:
    • Debian Bullseye (aarch64)
  • Installation type:
    • From source
  • Version or commit hash:
    • humble
  • DDS implementation:
    • rmw_cyclonedds_cpp
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

We have a node at hand which provides services which are regularly called by other instances of this node via wireless interfaces. When these services are not responding fast enough pending requests will be removed. Sporadically we then see such nodes terminating with an rclcpp::exceptions::RCLError where the error message is set to failed to send response: error not set. We cannot catch that exception as far as we can tell.

Expected behavior

take_and_do_error_handling catches RCLError when handling the service.

Actual behavior

The corresponding executor thread terminates with the following backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000fffff6485aa0 in __GI_abort () at abort.c:79
#2  0x0000fffff669e238 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6
#3  0x0000fffff669bd4c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
#4  0x0000fffff669bdb0 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6
#5  0x0000fffff669bd38 in std::rethrow_exception(std::__exception_ptr::exception_ptr) () from /lib/aarch64-linux-gnu/libstdc++.so.6
#6  0x0000fffff71a6058 in rclcpp::exceptions::throw_from_rcl_error(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rcutils_error_state_s const*, void (*)()) () from /opt/ros/humble/lib/librclcpp.so
#7  0x0000fffff7e67d44 in rclcpp::Service<foobar_msgs::srv::AgentMapSrv>::send_response(rmw_request_id_s&, foobar_msgs::srv::AgentMapSrv_Response_<std::allocator<void> >&) () from /home/foobar/ros2_devel/lib/libfoobar_component.so
#8  0x0000fffff7e600e4 in rclcpp::Service<foobar_msgs::srv::AgentMapSrv>::handle_request(std::shared_ptr<rmw_request_id_s>, std::shared_ptr<void>) () from /home/foobar/ros2_devel/lib/libfoobar_component.so
#9  0x0000fffff71af514 in rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}::operator()() const
    () from /opt/ros/humble/lib/librclcpp.so
#10 0x0000fffff71b47b8 in void std::__invoke_impl<void, rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}&>(std::__invoke_other, rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}&) ()
   from /opt/ros/humble/lib/librclcpp.so
#11 0x0000fffff71b37a0 in std::enable_if<is_invocable_r_v<void, rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}&>, void>::type std::__invoke_r<void, rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}&>(rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}&) ()
   from /opt/ros/humble/lib/librclcpp.so
#12 0x0000fffff71b2130 in std::_Function_handler<void (), rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}>::_M_invoke(std::_Any_data const&) () from /opt/ros/humble/lib/librclcpp.so
#13 0x0000fffff7190638 in std::function<void ()>::operator()() const () from /opt/ros/humble/lib/librclcpp.so
#14 0x0000fffff71ae63c in take_and_do_error_handling(char const*, char const*, std::function<bool ()>, std::function<void ()>) ()
   from /opt/ros/humble/lib/librclcpp.so
#15 0x0000fffff71af648 in rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>) ()
   from /opt/ros/humble/lib/librclcpp.so
#16 0x0000fffff71ae3b0 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) () from /opt/ros/humble/lib/librclcpp.so
#17 0x0000fffff71bc554 in rclcpp::executors::MultiThreadedExecutor::run(unsigned long) () from /opt/ros/humble/lib/librclcpp.so
#18 0x0000fffff71be314 in void std::__invoke_impl<void, void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&>(std::__invoke_memfun_deref, void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&) () from /opt/ros/humble/lib/librclcpp.so
#19 0x0000fffff71be20c in std::__invoke_result<void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&>::type std::__invoke<void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&>(void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&) () from /opt/ros/humble/lib/librclcpp.so
#20 0x0000fffff71be118 in void std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) ()
   from /opt/ros/humble/lib/librclcpp.so
#21 0x0000fffff71be07c in void std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>::operator()<, void>() () from /opt/ros/humble/lib/librclcpp.so
#22 0x0000fffff71be02c in void std::__invoke_impl<void, std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executor--Type <RET> for more, q to quit, c to continue without paging--
s::MultiThreadedExecutor*, unsigned long))(unsigned long)>>(std::__invoke_other, std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>&&) () from /opt/ros/humble/lib/librclcpp.so
#23 0x0000fffff71bdfbc in std::__invoke_result<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>>::type std::__invoke<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>>(std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>&&) () from /opt/ros/humble/lib/librclcpp.so
#24 0x0000fffff71bdf58 in void std::thread::_Invoker<std::tuple<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)> > >::_M_invoke<0ul>(std::_Index_tuple<0ul>) ()
   from /opt/ros/humble/lib/librclcpp.so
#25 0x0000fffff71bdf2c in std::thread::_Invoker<std::tuple<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)> > >::operator()() () from /opt/ros/humble/lib/librclcpp.so
#26 0x0000fffff71bdf0c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)> > > >::_M_run() ()
   from /opt/ros/humble/lib/librclcpp.so
#27 0x0000fffff66c6cac in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
#28 0x0000fffff5bec648 in start_thread (arg=0xffffd174d440) at pthread_create.c:477
#29 0x0000fffff6536fdc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

tpreclik avatar May 19 '23 13:05 tpreclik

Running without the multi-threaded executor we can catch these exceptions. This would possibly lead to an alternate solution: Allowing exceptions in threads of the multi-threaded executor to be caught.

tpreclik avatar May 19 '23 15:05 tpreclik

@tpreclik thanks for creating issue, can you share reproducible environment or snippet for this issue.

I was expecting that this could be related to https://github.com/ros2/ros2/issues/1253, but

Running without the multi-threaded executor we can catch these exceptions.

this makes me think this is another issue.

fujitatomoya avatar May 19 '23 16:05 fujitatomoya

I am not sure that this needs a reproducible. The general issue is that these exceptions cannot be caught in the multi-threaded executor. Or when using composables iirc. The stacktrace above is proof that this can happen. We observed these RCLErrors when transmitting large messages (occupancy grid maps with maps of 1500x1500 size which results in messages larger than 2 MB) via wireless links. We used CycloneDDS in our setup. These large messages will be subject to IP fragmentation which will lead to communication errors when stressing this setup a bit (extensive external traffic on the wireless band or sending multiple of those messages at the same time across the wireless network) etc. Also DDS tuning as described in https://docs.ros.org/en/humble/How-To-Guides/DDS-tuning.html can only improve the situation but cannot prevent communication errors from happening. Nodes need to be able to avoid crashing away due to communication errors. This also does not seem to be limited to service calls.

tpreclik avatar Jul 27 '23 10:07 tpreclik

The general issue is that these exceptions cannot be caught in the multi-threaded executor.

true. if the exception generated in worker thread, these exception cannot be caught with main thread.

see https://github.com/fujitatomoya/ros2_test_prover/commit/e9eea0eae84b2a4f5fa56015fe09cb8812fe5899

i think we need the rclcpp counterpart of https://github.com/ros2/rclpy/pull/1073 (related issue: https://github.com/ros2/rclpy/issues/983)

btw, i changed the subject title into Exceptions cannot be caught in the multi-threaded executor.

fujitatomoya avatar Jul 28 '23 22:07 fujitatomoya

错误报告

所需信息:

  • 操作系统:

    • Debian Bullseye (aarch64)
  • 安装类型:

    • 从源头
  • 版本或提交存储:

    • 谦逊的
  • DDS实施:

    • rmw_cyclonedds_cpp
  • 客户端库(如果适用):

    • RCCP

删除问题的步骤

我们手头有一个节点,它提供服务,该节点的其他实例通过无线接口定期调用这些服务。当这些服务响应速度不够快的时候,待处理的请求将被删除。然后,我们偶尔会看到这样的节点以rclcpp::exceptions::RCLError错误消息设置为的方式终止failed to send response: error not set。同样,我们无法捕获该异常。

预期行为

take_and_do_error_handling``RCLError处理服务时捕获。

实际行为

相应的执行器线程以以下回溯终止:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000fffff6485aa0 in __GI_abort () at abort.c:79
#2  0x0000fffff669e238 in __gnu_cxx::__verbose_terminate_handler() () from /lib/aarch64-linux-gnu/libstdc++.so.6
#3  0x0000fffff669bd4c in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
#4  0x0000fffff669bdb0 in std::terminate() () from /lib/aarch64-linux-gnu/libstdc++.so.6
#5  0x0000fffff669bd38 in std::rethrow_exception(std::__exception_ptr::exception_ptr) () from /lib/aarch64-linux-gnu/libstdc++.so.6
#6  0x0000fffff71a6058 in rclcpp::exceptions::throw_from_rcl_error(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rcutils_error_state_s const*, void (*)()) () from /opt/ros/humble/lib/librclcpp.so
#7  0x0000fffff7e67d44 in rclcpp::Service<foobar_msgs::srv::AgentMapSrv>::send_response(rmw_request_id_s&, foobar_msgs::srv::AgentMapSrv_Response_<std::allocator<void> >&) () from /home/foobar/ros2_devel/lib/libfoobar_component.so
#8  0x0000fffff7e600e4 in rclcpp::Service<foobar_msgs::srv::AgentMapSrv>::handle_request(std::shared_ptr<rmw_request_id_s>, std::shared_ptr<void>) () from /home/foobar/ros2_devel/lib/libfoobar_component.so
#9  0x0000fffff71af514 in rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}::operator()() const
    () from /opt/ros/humble/lib/librclcpp.so
#10 0x0000fffff71b47b8 in void std::__invoke_impl<void, rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}&>(std::__invoke_other, rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}&) ()
   from /opt/ros/humble/lib/librclcpp.so
#11 0x0000fffff71b37a0 in std::enable_if<is_invocable_r_v<void, rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}&>, void>::type std::__invoke_r<void, rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}&>(rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}&) ()
   from /opt/ros/humble/lib/librclcpp.so
#12 0x0000fffff71b2130 in std::_Function_handler<void (), rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>)::{lambda()#2}>::_M_invoke(std::_Any_data const&) () from /opt/ros/humble/lib/librclcpp.so
#13 0x0000fffff7190638 in std::function<void ()>::operator()() const () from /opt/ros/humble/lib/librclcpp.so
#14 0x0000fffff71ae63c in take_and_do_error_handling(char const*, char const*, std::function<bool ()>, std::function<void ()>) ()
   from /opt/ros/humble/lib/librclcpp.so
#15 0x0000fffff71af648 in rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>) ()
   from /opt/ros/humble/lib/librclcpp.so
#16 0x0000fffff71ae3b0 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) () from /opt/ros/humble/lib/librclcpp.so
#17 0x0000fffff71bc554 in rclcpp::executors::MultiThreadedExecutor::run(unsigned long) () from /opt/ros/humble/lib/librclcpp.so
#18 0x0000fffff71be314 in void std::__invoke_impl<void, void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&>(std::__invoke_memfun_deref, void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&) () from /opt/ros/humble/lib/librclcpp.so
#19 0x0000fffff71be20c in std::__invoke_result<void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&>::type std::__invoke<void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&>(void (rclcpp::executors::MultiThreadedExecutor::*&)(unsigned long), rclcpp::executors::MultiThreadedExecutor*&, unsigned long&) () from /opt/ros/humble/lib/librclcpp.so
#20 0x0000fffff71be118 in void std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) ()
   from /opt/ros/humble/lib/librclcpp.so
#21 0x0000fffff71be07c in void std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>::operator()<, void>() () from /opt/ros/humble/lib/librclcpp.so
#22 0x0000fffff71be02c in void std::__invoke_impl<void, std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executor--Type <RET> for more, q to quit, c to continue without paging--
s::MultiThreadedExecutor*, unsigned long))(unsigned long)>>(std::__invoke_other, std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>&&) () from /opt/ros/humble/lib/librclcpp.so
#23 0x0000fffff71bdfbc in std::__invoke_result<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>>::type std::__invoke<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>>(std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)>&&) () from /opt/ros/humble/lib/librclcpp.so
#24 0x0000fffff71bdf58 in void std::thread::_Invoker<std::tuple<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)> > >::_M_invoke<0ul>(std::_Index_tuple<0ul>) ()
   from /opt/ros/humble/lib/librclcpp.so
#25 0x0000fffff71bdf2c in std::thread::_Invoker<std::tuple<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)> > >::operator()() () from /opt/ros/humble/lib/librclcpp.so
#26 0x0000fffff71bdf0c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::_Bind<void (rclcpp::executors::MultiThreadedExecutor::*(rclcpp::executors::MultiThreadedExecutor*, unsigned long))(unsigned long)> > > >::_M_run() ()
   from /opt/ros/humble/lib/librclcpp.so
#27 0x0000fffff66c6cac in ?? () from /lib/aarch64-linux-gnu/libstdc++.so.6
#28 0x0000fffff5bec648 in start_thread (arg=0xffffd174d440) at pthread_create.c:477
#29 0x0000fffff6536fdc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Can you please share some information about how to get backtrace in ros2 cli? Thanks!

alexleel avatar Nov 14 '23 06:11 alexleel