rclcpp icon indicating copy to clipboard operation
rclcpp copied to clipboard

No callbacks if client/server died

Open jmachowinski opened this issue 2 years ago • 8 comments
trafficstars

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04
  • Installation type:
    • apt
  • Version or commit hash:
    • iron
  • DDS implementation:
    • Fast-RTPS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

Start an action server

  • action should be accepted, but not terminate

Start an action client that connects to the action server

Kill either the server or the client

Expected behavior

If the client is terminated, I would expect a CancelCallback.

If the server dies, I would expect ResultCallback for the goal with the status set to UNKNOWN

Actual behavior

No callbacks at all.

Additional information

I checked the API and could not find any way to check if a goal is still valid. On the server side the goal is still reported as active and executing if the client died.

jmachowinski avatar Jul 20 '23 09:07 jmachowinski

I want to add that killing the clients while the server is running can also result in the following crash on the server-side

    terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
       what():  failed to send response: client will not receive response,

alsora avatar Jul 20 '23 10:07 alsora

I want to add that killing the clients while the server is running can also result in the following crash on the server-side

This one in particular should be solved by https://github.com/ros2/rcl/pull/1048 (and linked PRs).

clalancette avatar Jul 20 '23 19:07 clalancette

If the client is terminated, I would expect a CancelCallback.

action server does not have connectivity map, so it does not have any idea which action client is disconnected. So this should be requested by client during dtor? probably we can add the process to send the cancel requests for all GoalStatus::STATUS_ACCEPTED goal handles during action client dtor? since this cancel is the service request, and there will be no client available to respond to, https://github.com/ros2/rcl/pull/1048 can avoid the exception on the server side.

If the server dies, I would expect ResultCallback for the goal with the status set to UNKNOWN

action client cannot send the result request at the moment that server dies, so service could not be used. but server can publish the status for each goal handles in process on the server side during dtor after setting the status UNKNOWN and canceling w/o user callback?

any thoughts?

CC: @ros2/team

fujitatomoya avatar Jul 24 '23 05:07 fujitatomoya

So this should be requested by client during dtor?

By terminated I mean it crashes, hard. So the dtor would not be called any more.

jmachowinski avatar Jul 24 '23 17:07 jmachowinski

By terminated I mean it crashes, hard. So the dtor would not be called any more.

I do not really understand this means. what leads to the crash?

fujitatomoya avatar Jul 26 '23 17:07 fujitatomoya

@fujitatomoya any programming error, e.g. access of a nullptr followed by a segfault.

Our scenario is that we have a server executing actions on our robot, and various clients that command actions. So getting informed that a client just vanished, and we need to stop the robot is safety critical for our use case. For now we worked around this issue by using a second heartbeat topic in both ways, but this should be a function of the actions themself.

jmachowinski avatar Jul 27 '23 08:07 jmachowinski

I want to add that killing the clients while the server is running can also result in the following crash on the server-side

This one in particular should be solved by ros2/rcl#1048 (and linked PRs).

Will this also be merged into iron ?

jmachowinski avatar Jul 27 '23 08:07 jmachowinski

@jmachowinski thanks for sharing your thoughts, appreciate it.

any programming error, e.g. access of a nullptr followed by a segfault.

there will be a signal or exception to catch this, and this seems to be application or system bug. i understand there are always possibility like these, so i will set up the signal handler for this situation to avoid the core crash to shutdown the application. that said i think this is basically program responsibility.

i think your concern is more like, what if the device where action server is running is powered off accidentally? that is something software cannot handle, and action server will be gone w/o any shutdown process. and this could happen. in this case, there is nothing we can do on action server side to provide the result to client. at least, clients can check if the server is available, and the keep the application running.

https://github.com/ros2/rclcpp/issues/2241#issuecomment-1647222954 is not perfect solution above case (i think there is no solution against the issue) but could be better than current behavior, i think.

Will this also be merged into iron ?

i think so, we can backport this to iron once it is merged to rolling. technically https://github.com/ros2/rcl/pull/1048 is for service, probably we can track this one https://github.com/ros2/ros2/issues/1462 for action.

fujitatomoya avatar Aug 02 '23 21:08 fujitatomoya