Disabled result caching: Result gets erased before result callback is registered
Bug report
Required Info:
- Operating System:
- Ubuntu 20.04
- Ubuntu 22.04
- Installation type:
- binaries
- Version or commit hash:
- galactic, foxy, humble
- DDS implementation:
- CycloneDDS
- Client library (if applicable):
- rclcpp_action
Steps to reproduce issue
I used the Fibonacci ActionServer (C++) and ActionClient (Python) and extended them as follows to reproduce the bug.
In the server I passed in options that disable the result caching and reduced the execute() function to return quicker.
#include <functional>
#include <memory>
#include <thread>
#include "action_tutorials_interfaces/action/fibonacci.hpp"
#include "rclcpp/rclcpp.hpp"
#include "rclcpp_action/rclcpp_action.hpp"
#include "rclcpp_components/register_node_macro.hpp"
#include "action_tutorials_cpp/visibility_control.h"
namespace action_tutorials_cpp
{
class FibonacciActionServer : public rclcpp::Node
{
public:
using Fibonacci = action_tutorials_interfaces::action::Fibonacci;
using GoalHandleFibonacci = rclcpp_action::ServerGoalHandle<Fibonacci>;
ACTION_TUTORIALS_CPP_PUBLIC
explicit FibonacciActionServer(const rclcpp::NodeOptions & options = rclcpp::NodeOptions())
: Node("fibonacci_action_server", options)
{
using namespace std::placeholders;
auto serverOptions = rcl_action_server_get_default_options();
serverOptions.result_timeout.nanoseconds = 0;
this->action_server_ = rclcpp_action::create_server<Fibonacci>(
this,
"fibonacci",
std::bind(&FibonacciActionServer::handle_goal, this, _1, _2),
std::bind(&FibonacciActionServer::handle_cancel, this, _1),
std::bind(&FibonacciActionServer::handle_accepted, this, _1),
serverOptions);
}
private:
rclcpp_action::Server<Fibonacci>::SharedPtr action_server_;
rclcpp_action::GoalResponse handle_goal(
const rclcpp_action::GoalUUID & uuid,
std::shared_ptr<const Fibonacci::Goal> goal)
{
RCLCPP_INFO(this->get_logger(), "Received goal request with order %d", goal->order);
(void)uuid;
return rclcpp_action::GoalResponse::ACCEPT_AND_EXECUTE;
}
rclcpp_action::CancelResponse handle_cancel(
const std::shared_ptr<GoalHandleFibonacci> goal_handle)
{
RCLCPP_INFO(this->get_logger(), "Received request to cancel goal");
(void)goal_handle;
return rclcpp_action::CancelResponse::ACCEPT;
}
void handle_accepted(const std::shared_ptr<GoalHandleFibonacci> goal_handle)
{
using namespace std::placeholders;
// this needs to return quickly to avoid blocking the executor, so spin up a new thread
std::thread{std::bind(&FibonacciActionServer::execute, this, _1), goal_handle}.detach();
}
void execute(const std::shared_ptr<GoalHandleFibonacci> goal_handle)
{
RCLCPP_INFO(this->get_logger(), "Executing goal");
const auto goal = goal_handle->get_goal();
auto result = std::make_shared<Fibonacci::Result>();
result->sequence.push_back(42);
goal_handle->succeed(result);
RCLCPP_INFO(this->get_logger(), "Executing done");
}
}; // class FibonacciActionServer
} // namespace action_tutorials_cpp
RCLCPP_COMPONENTS_REGISTER_NODE(action_tutorials_cpp::FibonacciActionServer)
In the client I added some more logging to both goal_response_callback and get_result_callback.
import rclpy
from rclpy.action import ActionClient
from rclpy.node import Node
from action_tutorials_interfaces.action import Fibonacci
class FibonacciActionClient(Node):
def __init__(self):
super().__init__('fibonacci_action_client')
self._action_client = ActionClient(self, Fibonacci, 'fibonacci')
def send_goal(self, order):
goal_msg = Fibonacci.Goal()
goal_msg.order = order
self._action_client.wait_for_server()
self._send_goal_future = self._action_client.send_goal_async(goal_msg)
self._send_goal_future.add_done_callback(self.goal_response_callback)
def goal_response_callback(self, future):
goal_handle = future.result()
if not goal_handle.accepted:
self.get_logger().info('Goal rejected :(')
return
self.get_logger().info('Goal accepted :)')
self.get_logger().info("goal_response_callback: get_result_async")
self._get_result_future = goal_handle.get_result_async()
self.get_logger().info("goal_response_callback: add_done_callback(result_callback)")
self._get_result_future.add_done_callback(self.get_result_callback)
self.get_logger().info("goal_response_callback: done")
def get_result_callback(self, future):
result = future.result()
self.get_logger().info('Result: status {}'.format(result.status))
self.get_logger().info('Result: {0}'.format(result.result.sequence))
rclpy.shutdown()
def main(args=None):
rclpy.init(args=args)
action_client = FibonacciActionClient()
action_client.send_goal(10)
rclpy.spin(action_client)
if __name__ == '__main__':
main()
Expected behavior
According to the ROS2 ActionServer design article I'd expect the server to discard the results "after responding to any pending result requests".
The client should log "result_callback: status 4", which is STATUS_SUCCEEDED.
Actual behavior
The server discards the result as soon as it detects it as expired (see server.cpp L500). In some cases the discarding happens before the client has registered the result callback and thus the result is lost and at the time the result_callback is executed the client logs "result_callback: status 0", which is STATUS_UNKNOWN.
Additional information
I also put the rclcpp_action code in my overlay and added and changed some debug messages to log_level WARN so that I could see when the result gets discarded and when STATUS_UNKNOWN is returned by the server. Here is the log in which one can see the failure.
------------------------------------------------------------------------------------------------------------------------
Client
------------------------------------------------------------------------------------------------------------------------
[INFO] [1676025199.686814735] [fibonacci_action_client]: Goal accepted :)
[INFO] [1676025199.687044809] [fibonacci_action_client]: goal_response_callback: get_result_async
[INFO] [1676025199.687553497] [fibonacci_action_client]: goal_response_callback: add_done_callback(result_callback)
[INFO] [1676025199.687765941] [fibonacci_action_client]: goal_response_callback: done
[INFO] [1676025199.688158300] [fibonacci_action_client]: Result: status 0
[INFO] [1676025199.693995159] [fibonacci_action_client]: Result: array('i')
------------------------------------------------------------------------------------------------------------------------
Server
------------------------------------------------------------------------------------------------------------------------
[INFO] [1676025199.677291824] [fibonacci_action_server]: Received goal request with order 10
[DEBUG] [1676025199.677356341] [rcl]: Sending service response
[WARN] [1676025199.677414186] [fibonacci_action_server.rclcpp_action]: Accepted goal 912881d7a81d4a17b8ac8d6e89d14b0
[DEBUG] [1676025199.677736839] [rcl]: Waiting without timeout
[DEBUG] [1676025199.677742997] [rcl]: Timeout calculated based on next scheduled timer: false
[DEBUG] [1676025199.677747903] [rcl]: Guard condition in wait set is ready
[DEBUG] [1676025199.677753235] [rcl]: Waiting without timeout
[DEBUG] [1676025199.677755049] [rcl]: Timeout calculated based on next scheduled timer: false
[INFO] [1676025199.677770234] [fibonacci_action_server]: Executing goal
[DEBUG] [1676025199.677945233] [rcl]: Timer successfully reset
[DEBUG] [1676025199.677950962] [rcl]: Updated timer period from '0ns' to '0ns'
[DEBUG] [1676025199.677958279] [rcl]: Waiting with timeout: 0s + 0ns
[DEBUG] [1676025199.677961209] [rcl]: Timeout calculated based on next scheduled timer: true
[INFO] [1676025199.678002260] [fibonacci_action_server]: Executing done
[DEBUG] [1676025199.679048075] [rcl]: Timer in wait set is ready
[DEBUG] [1676025199.679071412] [rcl]: Timer canceled
[WARN] [1676025199.679082202] [fibonacci_action_server.rclcpp_action]: Expired goal 912881d7a81d4a17b8ac8d6e89d14b0
[DEBUG] [1676025199.679193269] [rcl]: Timer canceled
[DEBUG] [1676025199.679202156] [rcl]: Waiting without timeout
[DEBUG] [1676025199.679204135] [rcl]: Timeout calculated based on next scheduled timer: false
[DEBUG] [1676025199.679207484] [rcl]: Guard condition in wait set is ready
[DEBUG] [1676025199.679211155] [rcl]: Waiting without timeout
[DEBUG] [1676025199.679212952] [rcl]: Timeout calculated based on next scheduled timer: false
[DEBUG] [1676025199.687154346] [rcl]: Service in wait set is ready
[DEBUG] [1676025199.687177283] [rcl]: Service server taking service request
[DEBUG] [1676025199.687183095] [rcl]: Service take request succeeded: true
[WARN] [1676025199.687225064] [fibonacci_action_server.rclcpp_action]: Goal does not exist 912881d7a81d4a17b8ac8d6e89d14b0
[WARN] [1676025199.687228270] [fibonacci_action_server.rclcpp_action]: Creating result response with STATUS_UNKNOWN
I think this problem only applies to the Python API, because in C++ the result callback is part of the SendGoalOptions? I understand that the server cannot predict that some Python ActionClient is going to request the result soon. And I am not sure whether this an actual bug or rather a design decision and known issue the client has to deal with. As far as I understand, the Python API does not offer the ActionClient a way to register interest in the result at any earlier point in time. If there should be a way to do so, I am happy to try it out.
We only saw the issue in our Python based tests were the server was executing a rather short execution callback.
@benthie thanks for creating issue.
we have the same problem with https://github.com/ros2/ros2/commit/1f5bd8ed43beea199dabe48bc8023af3aba9806c rolling.
i think this is not expected behavior as designed.
@Barry-Xu-2018 can we also close this aligned with https://github.com/ros2/rcl/issues/1103?
can we also close this aligned with https://github.com/ros2/rcl/issues/1103?
Yes. I tried reproduced codes with the latest rolling code.
The desired result can be gotten.
[INFO] [1722307939.302519491] [fibonacci_action_client]: Goal accepted :)
[INFO] [1722307939.302758793] [fibonacci_action_client]: goal_response_callback: get_result_async
[INFO] [1722307939.303069126] [fibonacci_action_client]: goal_response_callback: add_done_callback(result_callback)
[INFO] [1722307939.303270590] [fibonacci_action_client]: goal_response_callback: done
[INFO] [1722307939.303829785] [fibonacci_action_client]: Result: status 0
[INFO] [1722307939.304037665] [fibonacci_action_client]: Result: array('i')
@Barry-Xu-2018
Yes. I tried reproduced codes with the latest rolling code. The desired result can be gotten.
are you sure about this?
I really do not think so... i think that client should print Result: status 4, STATUS_SUCCEEDED https://docs.ros2.org/foxy/api/action_msgs/msg/GoalStatus.html according to the https://design.ros2.org/articles/actions.html#result-caching
IMO this problem still stays after https://github.com/ros2/rcl/issues/1103
I really do not think so... i think that client should print Result: status 4, STATUS_SUCCEEDED
Yes, my mistake. The expected status value is 4, not 0.
IMO this problem still stays after https://github.com/ros2/rcl/issues/1103
Yes. I will further check this.
When the goal is completed, ServerGoalHandle::succeed() will be called. https://github.com/ros2/rclcpp/blob/45adf6565f99042a5fdcad7b8830e69f5efb03e5/rclcpp_action/include/rclcpp_action/server_goal_handle.hpp#L207-L215
Then call https://github.com/ros2/rclcpp/blob/45adf6565f99042a5fdcad7b8830e69f5efb03e5/rclcpp_action/include/rclcpp_action/server.hpp#L477-L493
shared_this->publish_status(); If there are requests for the goal result, it will be sent here. If the goal is completed quickly, the action client may not have received the request for the result yet.
notify_goal_terminal_state()
-> rcl_action_notify_goal_done()
-> _recalculate_expire_timer()
In _recalculate_expire_timer(), expire_timer will be set with new period 0 (timeout is set to 0). Executor will call ServerBase::execute_check_expired_goals() to remove all expired goal.
So when the timeout is set to 0, the goal will be quickly removed once it is completed.
Currently, after receiving goal acceptance message, action client sending the request for the goal result. Considering the time needed for message processing and transmission (especially for Python, which has a longer processing time), when the timeout is set to 0 in action server, it is very likely that the action client will not receive the result.
If the timeout is configured to have value 0, then goal results are discarded immediately (after responding to any pending result requests).
The current implementation follows the design description. The design does not specify that setting the timeout to 0 guarantees that the action client will receive a result at least once.
If the goal is accepted, but the goal result cannot be obtained (like this issue), the user needs to consider whether the timeout value set for the action server is reasonable. So I tend to think this is not a bug.
According to the ROS2 ActionServer design article I'd expect the server to discard the results "after responding to any pending result requests".
so there is no pending result when the goal succeeds. that means that result request comes back with UNKNOWN goal.
this is juset because by the time the server receives the result request, the goal already succeeds and expires.
I think this problem only applies to the Python API, because in C++ the result callback is part of the SendGoalOptions? I
i do not think so. you are right that rclcpp provides the result callback function with async_send_goal, but what it does is almost same with your example with rclpy.
https://github.com/ros2/rclcpp/blob/2739327ee9af0256c7edcbccb810513af0018851/rclcpp_action/include/rclcpp_action/client.hpp#L469-L471
that sends the result request on the goal response callback, technically this is the same situation with the rclpy sample in the report header.
this does not guarantee the result is cached at least once the server sends the result to the client, or any timing advantage either.
probably using rclcpp can mitigate this problem situation, but the same problem could occur.
i believe that currently user needs to set up the timeout to make sure there is enough time to request the results.
if we want to support this issue, i think we need to sends the result request flag with goal request at the same time. and when the goal is accepted on the server side, the server caches that result request internally until the server responses at least one result. but this generates another problem, what if nobody, no clients request the result at all? the server will keep the result forever...
i think current design and behavior works for user application, any thoughts?