rclcpp
rclcpp copied to clipboard
spin_until_future_complete may block forever if nothing wakes the executor after the future completes
Bug report
Required Info:
- Operating System:
- jammy
- Installation type:
- source
- Version or commit hash:
- 85a7046ac39bfc0d787ed2bbe62574424e31b871
- DDS implementation:
- Fast-DDS
- Client library (if applicable):
- rclcpp
Steps to reproduce issue
C++:
#include <iostream>
#include <thread>
#include <chrono>
#include "rclcpp/rclcpp.hpp"
void sleep_a_bit()
{
std::cerr << "Sleeping for 5 seconds\n";
std::this_thread::sleep_for(std::chrono::seconds(5));
std::cerr << "Waking up and exiting function\n";
}
void rescue_callback()
{
std::cerr << "rescue timer ran\n";
}
int main(int argc, char ** argv)
{
rclcpp::init(argc, argv);
auto node = std::make_shared<rclcpp::Node>("rclcpp_spin_until_future_complete_blocked");
// Rescue timer will wake the executor, causing it to look at the future again
auto rescue_timer = rclcpp::create_timer(
node,
node->get_clock(),
rclcpp::Duration(10, 0),
std::bind(rescue_callback));
rclcpp::executors::SingleThreadedExecutor exe;
exe.add_node(node);
std::shared_future<void> script = std::async(std::launch::async, sleep_a_bit);
std::cerr << "About to sleep until future complete\n";
exe.spin_until_future_complete(script);
std::cerr << "Executor realized future is complete\n";
rclcpp::shutdown();
return 0;
}
CMake:
add_executable(rclcpp_spin_until_future_complete_blocked
src/rclcpp_spin_until_future_complete_blocked.cpp)
target_link_libraries(rclcpp_spin_until_future_complete_blocked
rclcpp::rclcpp)
Expected behavior
I would expect the future being completed would cause spin_until_future_complete()
to return, and the program would shutdown without the timer callback running.
$ ./rclcpp_spin_until_future_complete_blocked
About to sleep until future complete
Sleeping for 5 seconds
Waking up and exiting function
Executor realized future is complete
Actual behavior
After the future completes, spin_until_future_complete()
blocks for another 5 seconds until the timer callback is run.
$ ./rclcpp_spin_until_future_complete_blocked
About to sleep until future complete
Sleeping for 5 seconds
Waking up and exiting function
rescue timer ran
Executor realized future is complete
Additional information
A similar limitation exists in rclpy.
I thinkspin_until_future_complete()
only returns immediately after completion if the future is completed in the callback of the same executor.
We should probably document it here.
This limitation is why the demo for lifecycle_service_client
doesn't exit (related issue ros2/demos#504)
Yes.
spin_until_future_complete
is just a fancy way of saying: do a while loop where you spin_some
and check the future in between spins.
It's responsibility of the user to make sure that whatever sets the future does that through the executor. For example: if the future comes from a ROS client request, the executor will wake up when you receive the response (and that's when the future is completed). This should definitely be indicated in the API documentation.
I'm not sure if we should change this behavior as it may also be non-trivial. Even if the API had a way to pair the future with a condition variable or something similar that would wake up the executor, how would it know if this is needed or not? It would be needed in your example, but not if the future comes from a ROS client.
It's responsibility of the user to make sure that whatever sets the future does that through the executor.
That can be tricky. I thought about using GuardCondition
callbacks as a way to get the executor to do work, but there seems to be a bug where when a guard condition has a callback, it doesn't call the rcl
method that would have woken the executor.
https://github.com/ros2/rclcpp/blob/85a7046ac39bfc0d787ed2bbe62574424e31b871/rclcpp/src/rclcpp/guard_condition.cpp#L79-L87
In ros2/demos#558 I made another std::async
call just to wait on the future and wake the executor. I didn't wake the executor at the end of callee_script
because I think it would be a race condition between the executor spinning again and the shared_future
being updated at the end of the function call. Any ideas to make this easier?
rclpy
avoids this by having it's own Future
class with callbacks that get called after the result on the future is set. The executor adds a done callback to the future to wake itself.
Even if the API had a way to pair the future with a condition variable or something similar that would wake up the executor, how would it know if this is needed or not?
I don't have any ideas here. rclpy
wakes the executor unconditionally.
This will be even more complicated in the new API - proposed in #1874 (Accepted, pending for Humble to branch the release out of master). GuardConditon
would need to also wake up any condition that can be invoked - considering that the special Future
class for rclcpp
makes no sense anymore.
Maybe I will try to come up with some idea during this weekend on how this could be solved.
Is it possible to wait on a bunch of conditions at once - sort of like select()
? If that exists maybe a dedicated thread could wait on all known conditions and call executor.cancel()
when any one of them completes.
It's responsibility of the user to make sure that whatever sets the future does that through the executor.
Actually that's not technically a requirement. The future could be set by anything, including a thread created by the user.
Is it possible to wait on a bunch of conditions at once - sort of like select()? If that exists maybe a dedicated thread could wait on all known conditions and call executor.cancel() when any one of them completes.
This seems unnecessarily complicated. We may need to iterate on https://github.com/ros2/rclcpp/pull/1874 more, if this is required too.
It's responsibility of the user to make sure that whatever sets the future does that through the executor.
Actually that's not technically a requirement. The future could be set by anything, including a thread created by the user.
Of course, after writing that, I realize that this is core problem. While what I said is true, it also means that the executor will prevent the function from returning until it times out (if it times out).
But it's still true that something that runs outside of the executor could complete the future, but it would have to also ensure the executor is interrupted.
So, I'd actually be inclined to mark this as "won't fix", because that's expected behavior for this function.
I also left a suggest here to how to better solve the original problem:
https://github.com/ros2/demos/pull/558/files#r871940510
Hi folks.
I'm trying to produce a class to monitor a lifecycle node by reading the get_state service.
When I start the node and trigger the client to get_state, the service never finishes.
I think the problem is related to this issue. Someone has an idea or have some snippet to execute this task?