rclcpp icon indicating copy to clipboard operation
rclcpp copied to clipboard

Change of Waitable waitset interface

Open jmachowinski opened this issue 1 year ago • 3 comments

Feature request

Feature description

Currently the rclcpp::Waitable operates directly on the waitset.

  add_to_wait_set(rcl_wait_set_t * wait_set) = 0;

  virtual
  bool
  is_ready(rcl_wait_set_t * wait_set) = 0;

The problem here is, that the executor is not directly aware of the changes to the waitset done bye the Waitable. Therefore the executor can not do certain optimizations, like use precomputed waitsets. Another drawback of the current implementation is, that we need to hold a shared_ptr to the waitable while we call rcl::wait in order to make sure that the rcl primitives in the waitset do not get deleted.

Therefore I would propose to change the interface of the rclcpp::Waitable to something like this:

struct WaitableSetHandles
{
std::vector<std::shared_ptr<rcl_subscription_t> subscription_handles;
std::vector<std::shared_ptr<rcl_client_t> client_handles;
.
.
.
// If this triggers, the executer must request a new WaitableSetHandles from the waitable
std::optional<std::shared_ptr<rcl_guard_condition_t>> waitable_changed;
};

struct WaitableSetHandleState
{
std::std::vector<bool> subscription_is_ready;
std::std::vector<bool> client_is_ready;
.
.
.
};

class Waitable
{

/**
* Will be called before rcl::wait will be called. This may
* be used for rearming guard_conditions, if there is still
* data to be processed.
*/
virtual void before_wait_cb();

/**
* Returns all rcl handles used by this waitable
*/
virtual WaitableSetHandles getWaitsetHandles() = 0;

/**
* Returns all rcl handles used by this waitable
*/
virtual bool
  is_ready(WaitableSetHandleState * state) = 0;

};

Implementation considerations

This has an impact on a public interface, and though might/will break downstream implementations. On the pro side, we can optimize a code path that is being heavily used reducing the cpu load and latency of the standard executors. Note, with this change, we could also integrate an optimization into the executor, that should get rid of most of the std::weak_ptr::lock() calls.

@wjwwood and @mjcarroll whats you opinion and this ?

jmachowinski avatar Jan 29 '24 12:01 jmachowinski

Another drawback of the current implementation is, that we need to hold a shared_ptr to the waitable while we call rcl::wait in order to make sure that the rcl primitives in the waitset do not get deleted.

We currently hold a shared_ptr to every entity in the waitset, so I don't think this will be as much of a change. (for example, in the collect_all_ptrs call in wait_for_work: https://github.com/ros2/rclcpp/blob/rolling/rclcpp/src/rclcpp/callback_group.cpp#L69)

Note, with this change, we could also integrate an optimization into the executor, that should get rid of most of the std::weak_ptr::lock() calls.

I'm not sure how this would work, can you elaborate more?

mjcarroll avatar Jan 29 '24 23:01 mjcarroll

We currently hold a shared_ptr to every entity in the waitset, so I don't think this will be as much of a change. (for example, in the collect_all_ptrs call in wait_for_work: https://github.com/ros2/rclcpp/blob/rolling/rclcpp/src/rclcpp/callback_group.cpp#L69)

We acquire the shared_ptr to the rclcpp elements for a short while, while building the rcl waitset, afterwards they are dropped, and only weak_ptr are held. During the call to rcl_wait there should not be any more shared_ptr references. With the action client (which is a rclcpp::waitable) we see bugs, that it is still beeing processed, after 'userspace' dropped the shared_ptr. I suspect the reference inside the executor to be the culprit.

Note, with this change, we could also integrate an optimization into the executor, that should get rid of most of the std::weak_ptr::lock() calls.

I'm not sure how this would work, can you elaborate more?

The idea would be, that whenever the guard_condition of a callback_group triggers, that this is the only place, were we acquire the shared_ptrs of its rclcpp entities for a short while. When triggered, we collect all shared_ptrs to the rcl entries.

After this, we can reuse the collected rcl shared_ptrs to rebuild the waitset, without the need to call lock on the weak_ptrs to the rclcpp elements. So this would get rid of a lot of lock calls in a hot code path.

Another optimization one can do, is to use the ref_count() method on the collected rcl shared_ptrs, to find out, if corresponding rclcpp element was deleted. Note, calling ref_count is ~10x faster than calling lock().

jmachowinski avatar Jan 30 '24 11:01 jmachowinski

After this, we can reuse the collected rcl shared_ptrs to rebuild the waitset, without the need to call lock on the weak_ptrs to the rclcpp elements. So this would get rid of a lot of lock calls in a hot code path.

Okay, this is much closer to how the rclcpp::WaitSet implementation in 2142 works. The shared_ptr is held to construct the waitset, held during the actual wait, and released when WaitResult is destroyed.

mjcarroll avatar Jan 30 '24 13:01 mjcarroll