rclcpp
rclcpp copied to clipboard
Change of Waitable waitset interface
Feature request
Feature description
Currently the rclcpp::Waitable operates directly on the waitset.
add_to_wait_set(rcl_wait_set_t * wait_set) = 0;
virtual
bool
is_ready(rcl_wait_set_t * wait_set) = 0;
The problem here is, that the executor is not directly aware of the changes to the waitset done bye the Waitable. Therefore the executor can not do certain optimizations, like use precomputed waitsets. Another drawback of the current implementation is, that we need to hold a shared_ptr to the waitable while we call rcl::wait in order to make sure that the rcl primitives in the waitset do not get deleted.
Therefore I would propose to change the interface of the rclcpp::Waitable to something like this:
struct WaitableSetHandles
{
std::vector<std::shared_ptr<rcl_subscription_t> subscription_handles;
std::vector<std::shared_ptr<rcl_client_t> client_handles;
.
.
.
// If this triggers, the executer must request a new WaitableSetHandles from the waitable
std::optional<std::shared_ptr<rcl_guard_condition_t>> waitable_changed;
};
struct WaitableSetHandleState
{
std::std::vector<bool> subscription_is_ready;
std::std::vector<bool> client_is_ready;
.
.
.
};
class Waitable
{
/**
* Will be called before rcl::wait will be called. This may
* be used for rearming guard_conditions, if there is still
* data to be processed.
*/
virtual void before_wait_cb();
/**
* Returns all rcl handles used by this waitable
*/
virtual WaitableSetHandles getWaitsetHandles() = 0;
/**
* Returns all rcl handles used by this waitable
*/
virtual bool
is_ready(WaitableSetHandleState * state) = 0;
};
Implementation considerations
This has an impact on a public interface, and though might/will break downstream implementations. On the pro side, we can optimize a code path that is being heavily used reducing the cpu load and latency of the standard executors. Note, with this change, we could also integrate an optimization into the executor, that should get rid of most of the std::weak_ptr::lock() calls.
@wjwwood and @mjcarroll whats you opinion and this ?
Another drawback of the current implementation is, that we need to hold a shared_ptr to the waitable while we call rcl::wait in order to make sure that the rcl primitives in the waitset do not get deleted.
We currently hold a shared_ptr to every entity in the waitset, so I don't think this will be as much of a change. (for example, in the collect_all_ptrs call in wait_for_work: https://github.com/ros2/rclcpp/blob/rolling/rclcpp/src/rclcpp/callback_group.cpp#L69)
Note, with this change, we could also integrate an optimization into the executor, that should get rid of most of the std::weak_ptr::lock() calls.
I'm not sure how this would work, can you elaborate more?
We currently hold a shared_ptr to every entity in the waitset, so I don't think this will be as much of a change. (for example, in the
collect_all_ptrscall inwait_for_work: https://github.com/ros2/rclcpp/blob/rolling/rclcpp/src/rclcpp/callback_group.cpp#L69)
We acquire the shared_ptr to the rclcpp elements for a short while, while building the rcl waitset, afterwards they are dropped, and only weak_ptr are held. During the call to rcl_wait there should not be any more shared_ptr references. With the action client (which is a rclcpp::waitable) we see bugs, that it is still beeing processed, after 'userspace' dropped the shared_ptr. I suspect the reference inside the executor to be the culprit.
Note, with this change, we could also integrate an optimization into the executor, that should get rid of most of the std::weak_ptr::lock() calls.
I'm not sure how this would work, can you elaborate more?
The idea would be, that whenever the guard_condition of a callback_group triggers, that this is the only place, were we acquire the shared_ptrs of its rclcpp entities for a short while. When triggered, we collect all shared_ptrs to the rcl entries.
After this, we can reuse the collected rcl shared_ptrs to rebuild the waitset, without the need to call lock on the weak_ptrs to the rclcpp elements. So this would get rid of a lot of lock calls in a hot code path.
Another optimization one can do, is to use the ref_count() method on the collected rcl shared_ptrs, to find out, if corresponding rclcpp element was deleted. Note, calling ref_count is ~10x faster than calling lock().
After this, we can reuse the collected rcl shared_ptrs to rebuild the waitset, without the need to call lock on the weak_ptrs to the rclcpp elements. So this would get rid of a lot of lock calls in a hot code path.
Okay, this is much closer to how the rclcpp::WaitSet implementation in 2142 works. The shared_ptr is held to construct the waitset, held during the actual wait, and released when WaitResult is destroyed.