rclcpp
rclcpp copied to clipboard
No callbacks if client/server died
Bug report
Required Info:
- Operating System:
- Ubuntu 22.04
- Installation type:
- apt
- Version or commit hash:
- iron
- DDS implementation:
- Fast-RTPS
- Client library (if applicable):
- rclcpp
Steps to reproduce issue
Start an action server
- action should be accepted, but not terminate
Start an action client that connects to the action server
Kill either the server or the client
Expected behavior
If the client is terminated, I would expect a CancelCallback.
If the server dies, I would expect ResultCallback for the goal with the status set to UNKNOWN
Actual behavior
No callbacks at all.
Additional information
I checked the API and could not find any way to check if a goal is still valid. On the server side the goal is still reported as active and executing if the client died.
I want to add that killing the clients while the server is running can also result in the following crash on the server-side
terminate called after throwing an instance of 'rclcpp::exceptions::RCLError'
what(): failed to send response: client will not receive response,
I want to add that killing the clients while the server is running can also result in the following crash on the server-side
This one in particular should be solved by https://github.com/ros2/rcl/pull/1048 (and linked PRs).
If the client is terminated, I would expect a CancelCallback.
action server does not have connectivity map, so it does not have any idea which action client is disconnected. So this should be requested by client during dtor? probably we can add the process to send the cancel requests for all GoalStatus::STATUS_ACCEPTED goal handles during action client dtor? since this cancel is the service request, and there will be no client available to respond to, https://github.com/ros2/rcl/pull/1048 can avoid the exception on the server side.
If the server dies, I would expect ResultCallback for the goal with the status set to UNKNOWN
action client cannot send the result request at the moment that server dies, so service could not be used. but server can publish the status for each goal handles in process on the server side during dtor after setting the status UNKNOWN and canceling w/o user callback?
any thoughts?
CC: @ros2/team
So this should be requested by client during dtor?
By terminated I mean it crashes, hard. So the dtor would not be called any more.
By terminated I mean it crashes, hard. So the dtor would not be called any more.
I do not really understand this means. what leads to the crash?
@fujitatomoya any programming error, e.g. access of a nullptr followed by a segfault.
Our scenario is that we have a server executing actions on our robot, and various clients that command actions. So getting informed that a client just vanished, and we need to stop the robot is safety critical for our use case. For now we worked around this issue by using a second heartbeat topic in both ways, but this should be a function of the actions themself.
I want to add that killing the clients while the server is running can also result in the following crash on the server-side
This one in particular should be solved by ros2/rcl#1048 (and linked PRs).
Will this also be merged into iron ?
@jmachowinski thanks for sharing your thoughts, appreciate it.
any programming error, e.g. access of a nullptr followed by a segfault.
there will be a signal or exception to catch this, and this seems to be application or system bug. i understand there are always possibility like these, so i will set up the signal handler for this situation to avoid the core crash to shutdown the application. that said i think this is basically program responsibility.
i think your concern is more like, what if the device where action server is running is powered off accidentally? that is something software cannot handle, and action server will be gone w/o any shutdown process. and this could happen.
in this case, there is nothing we can do on action server side to provide the result to client. at least, clients can check if the server is available, and the keep the application running.
https://github.com/ros2/rclcpp/issues/2241#issuecomment-1647222954 is not perfect solution above case (i think there is no solution against the issue) but could be better than current behavior, i think.
Will this also be merged into iron ?
i think so, we can backport this to iron once it is merged to rolling. technically https://github.com/ros2/rcl/pull/1048 is for service, probably we can track this one https://github.com/ros2/ros2/issues/1462 for action.