Asynchronous action acceptance and cancellation

Open nwn opened this issue 2 years ago • 1 comments

Asynchronous action acceptance and cancellation

Feature description

Provide asynchronous alternative signatures for the rclcpp_action::Server's GoalCallback and CancelCallback. That is, allow callbacks to not immediately return the accept/reject response, instead delaying this to a later time. Doing so would enable actions to be more flexible and used in more situations.

Motivation

The current design of rclcpp_action doesn't allow for easy composition (in the functional sense of the word) of actions. That is, actions can't be transparently implemented via other actions. As an example use case, we're working on a project that significantly changes the format of data in an action goal. For the transition period, we would like to support both goal types, allowing us to invoke the action using either format. An elegant solution to this would be to have the new action simply wrap calls to the existing action, acting as a thin translation layer between the two formats. However, this isn't possible without easy composition.

Why composition doesn't work today

The rclcpp_action::Client class offers only asynchronous functions to send goals and cancel goals. This is reasonable since we don't always want to block the executing thread while we await a result, and these can always be made synchronous by just calling .get() on the returned future.

The rclcpp_action::Server class' GoalCallback and CancelCallback, which are invoked when the above functions are called by the client, are fundamentally synchronous – by returning an immediate value rather than a future, these callbacks are forced to block on any operation performed within before returning. These callbacks are also invoked by the executor, meaning they cannot be delegated to an outside thread and so must be run on an executor thread.

In combination, these two designs mean that when using the rclcpp_action interface, one cannot implement a proxy or wrapper action server that calls another without either:

Blocking an executor thread, or
Imperfectly proxying the underlying action's state machine (explained below).

Specifically, by having the proxy server's GoalCallback call async_send_goal() to the underlying action, it must synchronously block on the result to determine which GoalResponse to return. This blocks the executor thread until a response is received. Similarly, the proxy server's CancelCallback cannot call async_cancel_goal() to determine whether the underlying cancellation was accepted without blocking the executor.

Possible solutions

One solution to this, which we ended up using for the above "translation layer" example, is to unconditionally accept goals in the proxy server's GoalCallback and delegate calling async_send_goal() to a separate thread, either from the GoalCallback itself or in the corresponding AcceptedCallback. Doing so avoids blocking the executor but means that the state of the proxy goal no longer matches that of the underlying goal, introducing additional complexity. In this case, the proxy server must then handle eventually rejected underlying goals by abort()ing the proxy goal. This additional failure mode must also be reflected in the client code for the proxy action. Similar inconsistent states are introduced by unconditionally accepting cancellation requests in the proxy server. Ideally, this should not be the case and we should be able to transparently proxy actions without impacting the client's view of the state machine.

Another solution to this, which we used in a different project for transparently forwarding actions between domains, is to forgo the Client and Server abstractions offered by the rclcpp_action library and instead break down actions into their constituent topics. (That is, hooking directly into the topics for the status, feedback, send-goal request/reply, get-result request/reply, etc.) Each topic's subscription callback can then perform whatever relevant transformation or operation it needs before re-publishing on the other side. This works well and avoids both of the above issues, but exposes to developers the implementation details of actions. Moreover, applying this method in more complex scenarios requires reimplementing substantial parts of rclcpp_action.

The remaining solution suggested here is to investigate providing an alternative asynchronous signature for the Server's GoalCallback and CancelCallback that doesn't require an immediate response. This would enable the callbacks to perform arbitrary non-blocking operations before sending a response to the client. In doing so, we can then implement transparent action proxies that don't block the executor or expose action internals.

Implementation considerations

By using alternative function signatures for the asynchronous version of the callbacks, the synchronous and asynchronous callbacks could be specified in the same way by invoking different overloads of the Server constructor.

The are a few different options for what these callback signatures could look like:

Simply wrap the current "immediate" return type in a std::future, such as:
std::future<CancelResponse> (std::shared_ptr<ServerGoalHandle<ActionT>>)
This is the simplest solution for the user, and would potentially make integration with other asynchronous libraries easier. However, it doesn't provide a clear means for the executor to be notified of this future's readiness outside of regular polling or blocking a dedicated thread. (std::future currently doesn't support a continuation function like the proposed std::future::then.)
Passing a std::promise type as an argument to the callback, such as:
void (std::shared_ptr<ServerGoalHandle<ActionT>>, std::promise<CancelResponse>)
This would give the library more control over the future value, but seems to have the same issue with notifying the executor.
Passing a custom type as an argument to the callback, such as:
void (std::shared_ptr<ServerGoalHandle<ActionT>>, GoalCancelHandle)
This new GoalCancelHandle type could expose methods to either accept or reject the request, acting similarly to the ServerGoalHandle's succeed() and abort() methods. As rclcpp_action controls the type, it could easily implement the "continuation" logic of notifying the executor when the cancellation has a response.

I'm not too familiar with the internal implications of such a change. I suspect it would involve addition of a new guard condition to each action at the rcl_action level, which would be notified whenever one of the responses becomes ready. It may also require additional handling of "pending" actions as actions may now spend an extended amount of time in this state.

I would appreciate people's thoughts on this issue and whether there is interest in the proposed solution.

Nov 07 '23 21:11 nwn

I put together a prototype of this feature here (based on 16.0.3 since I'm currently running Humble). The implementation wasn't too complex, and mainly involved splitting the logic from the ServerBase::execute_goal_request_received() and ServerBase::execute_cancel_request_received() functions into the work that's needed before the user callback is invoked, and that which should occur once a response is provided. The logic for the latter is encoded as "continuation" functions stored in new ServerGoalRequestHandle and ServerCancelRequestHandle types, which are passed to the user callbacks. Doing so required the addition of new signatures for the "asynchronous" versions of these callbacks:

using GoalCallback = std::function<GoalResponse(
      const GoalUUID &, std::shared_ptr<const typename ActionT::Goal>)>;
using GoalAsyncCallback = std::function<void(
      const GoalUUID &, std::shared_ptr<const typename ActionT::Goal>,
      std::shared_ptr<ServerGoalRequestHandle>)>;

using CancelCallback = std::function<CancelResponse(
      std::shared_ptr<ServerGoalHandle<ActionT>>)>;
using CancelAsyncCallback = std::function<void(
      std::shared_ptr<ServerGoalHandle<ActionT>>,
      std::shared_ptr<ServerCancelRequestHandle>)>;

These new request handles are designed to work similarly to the existing ServerGoalHandle type, exposing various member functions to express the server's response. Their API is essentially:

struct ServerGoalRequestHandle {
  void accept_and_execute();  // Equivalent to respond(GoalResponse::ACCEPT_AND_EXECUTE)
  void accept_and_defer();    // Equivalent to respond(GoalResponse::ACCEPT_AND_DEFER)
  void reject();              // Equivalent to respond(GoalResponse::REJECT)

  /// Responds to the goal request with the given response, as if returned from the synchronous callback.
  void respond(GoalResponse response);
};

struct ServerCancelRequestHandle {
  void accept();  // Equivalent to respond(CancelResponse::ACCEPT)
  void reject();  // Equivalent to respond(CancelResponse::REJECT)

  /// Responds to the cancellation request with the given response, as if returned from the synchronous callback.
  void respond(CancelResponse response);
};

The aim is to have the respond() functions (and their convenience wrappers) behave exactly as if the user had returned the given response from the synchronous versions of these callbacks. In fact, in the current implementation, synchronous callbacks provided by the user are converted into an equivalent asynchronous callback using something like:

template<typename ActionT>
auto to_goal_async_callback(GoalCallback sync_cb) -> GoalAsyncCallback
{
  return [sync_cb](const GoalUUID & uuid, std::shared_ptr<const typename ActionT::Goal> goal,
           std::shared_ptr<ServerGoalRequestHandle> goal_request_handle) {
           auto response = sync_cb(uuid, goal);
           goal_request_handle->respond(response);
         };
}

This allows us to only handle the single type of callback internally.

It was originally expected that the different signatures of the new async callbacks would allow for easy backward compatibility with existing code that uses synchronous callbacks. Ideally, users would also be able mix and match the two, choosing between sync and async versions independently for both of the callbacks. My first attempt at implementing this used simple function overloads of create_server() and the Server constructor. However, during testing I discovered that the return value of std::bind() is callable with a variadic number of arguments, where only those which were named as placeholder values are meaningful. This means that casting this result to either a synchronous or asynchronous callback type is ambiguous and would break any existing code that uses std::bind for its synchronous callbacks. I therefore had to implement some template traits that distinguish the two signatures and cast them accordingly before passing them on to the Server constructor.

Hopefully this prototype can serve as a starting point for exploring this feature request. I'll continue testing it and using it in my own work to seek out any bugs or shortcomings.

Nov 24 '23 18:11 nwn