rclcpp spin_until_future_complete may block forever if nothing wakes the executor after the future completes

Bug report

Required Info:

Operating System:
- jammy
Installation type:
- source
Version or commit hash:
- 85a7046ac39bfc0d787ed2bbe62574424e31b871
DDS implementation:
- Fast-DDS
Client library (if applicable):
- rclcpp

Steps to reproduce issue

C++:

#include <iostream>
#include <thread>
#include <chrono>
#include "rclcpp/rclcpp.hpp"


void sleep_a_bit()
{
  std::cerr << "Sleeping for 5 seconds\n";
  std::this_thread::sleep_for(std::chrono::seconds(5));
  std::cerr << "Waking up and exiting function\n";
}

void rescue_callback()
{
  std::cerr << "rescue timer ran\n";
}

int main(int argc, char ** argv)
{
  rclcpp::init(argc, argv);

  auto node = std::make_shared<rclcpp::Node>("rclcpp_spin_until_future_complete_blocked");

  // Rescue timer will wake the executor, causing it to look at the future again
  auto rescue_timer = rclcpp::create_timer(
      node,
      node->get_clock(),
      rclcpp::Duration(10, 0),
      std::bind(rescue_callback));

  rclcpp::executors::SingleThreadedExecutor exe;
  exe.add_node(node);

  std::shared_future<void> script = std::async(std::launch::async, sleep_a_bit);
  std::cerr << "About to sleep until future complete\n";
  exe.spin_until_future_complete(script);
  std::cerr << "Executor realized future is complete\n";

  rclcpp::shutdown();

  return 0;
}

CMake:

add_executable(rclcpp_spin_until_future_complete_blocked
  src/rclcpp_spin_until_future_complete_blocked.cpp)
target_link_libraries(rclcpp_spin_until_future_complete_blocked
  rclcpp::rclcpp)

Expected behavior

I would expect the future being completed would cause spin_until_future_complete() to return, and the program would shutdown without the timer callback running.

$ ./rclcpp_spin_until_future_complete_blocked 
About to sleep until future complete
Sleeping for 5 seconds
Waking up and exiting function
Executor realized future is complete

Actual behavior

After the future completes, spin_until_future_complete() blocks for another 5 seconds until the timer callback is run.

$ ./rclcpp_spin_until_future_complete_blocked 
About to sleep until future complete
Sleeping for 5 seconds
Waking up and exiting function
rescue timer ran
Executor realized future is complete

Additional information

A similar limitation exists in rclpy. I thinkspin_until_future_complete() only returns immediately after completion if the future is completed in the callback of the same executor. We should probably document it here. This limitation is why the demo for lifecycle_service_client doesn't exit (related issue ros2/demos#504)

Apr 07 '22 21:04 sloretz

Yes. spin_until_future_complete is just a fancy way of saying: do a while loop where you spin_some and check the future in between spins.

It's responsibility of the user to make sure that whatever sets the future does that through the executor. For example: if the future comes from a ROS client request, the executor will wake up when you receive the response (and that's when the future is completed). This should definitely be indicated in the API documentation.

I'm not sure if we should change this behavior as it may also be non-trivial. Even if the API had a way to pair the future with a condition variable or something similar that would wake up the executor, how would it know if this is needed or not? It would be needed in your example, but not if the future comes from a ROS client.

Apr 07 '22 21:04 alsora

It's responsibility of the user to make sure that whatever sets the future does that through the executor.

That can be tricky. I thought about using GuardCondition callbacks as a way to get the executor to do work, but there seems to be a bug where when a guard condition has a callback, it doesn't call the rcl method that would have woken the executor.

https://github.com/ros2/rclcpp/blob/85a7046ac39bfc0d787ed2bbe62574424e31b871/rclcpp/src/rclcpp/guard_condition.cpp#L79-L87

In ros2/demos#558 I made another std::async call just to wait on the future and wake the executor. I didn't wake the executor at the end of callee_script because I think it would be a race condition between the executor spinning again and the shared_future being updated at the end of the function call. Any ideas to make this easier?

rclpy avoids this by having it's own Future class with callbacks that get called after the result on the future is set. The executor adds a done callback to the future to wake itself.

Even if the API had a way to pair the future with a condition variable or something similar that would wake up the executor, how would it know if this is needed or not?

I don't have any ideas here. rclpy wakes the executor unconditionally.

Apr 07 '22 22:04 sloretz

This will be even more complicated in the new API - proposed in #1874 (Accepted, pending for Humble to branch the release out of master). GuardConditon would need to also wake up any condition that can be invoked - considering that the special Future class for rclcpp makes no sense anymore.

Maybe I will try to come up with some idea during this weekend on how this could be solved.

Apr 08 '22 08:04 hliberacki

Is it possible to wait on a bunch of conditions at once - sort of like select()? If that exists maybe a dedicated thread could wait on all known conditions and call executor.cancel() when any one of them completes.

Apr 28 '22 17:04 sloretz

It's responsibility of the user to make sure that whatever sets the future does that through the executor.

Actually that's not technically a requirement. The future could be set by anything, including a thread created by the user.

Is it possible to wait on a bunch of conditions at once - sort of like select()? If that exists maybe a dedicated thread could wait on all known conditions and call executor.cancel() when any one of them completes.

This seems unnecessarily complicated. We may need to iterate on https://github.com/ros2/rclcpp/pull/1874 more, if this is required too.

May 13 '22 01:05 wjwwood

It's responsibility of the user to make sure that whatever sets the future does that through the executor.

Actually that's not technically a requirement. The future could be set by anything, including a thread created by the user.

Of course, after writing that, I realize that this is core problem. While what I said is true, it also means that the executor will prevent the function from returning until it times out (if it times out).

But it's still true that something that runs outside of the executor could complete the future, but it would have to also ensure the executor is interrupted.

So, I'd actually be inclined to mark this as "won't fix", because that's expected behavior for this function.

May 13 '22 01:05 wjwwood

I also left a suggest here to how to better solve the original problem:

https://github.com/ros2/demos/pull/558/files#r871940510

May 13 '22 02:05 wjwwood

Hi folks.

I'm trying to produce a class to monitor a lifecycle node by reading the get_state service.

When I start the node and trigger the client to get_state, the service never finishes.

I think the problem is related to this issue. Someone has an idea or have some snippet to execute this task?

Apr 03 '23 17:04 lucasamparo

rclcpp rclcpp copied to clipboard

spin_until_future_complete may block forever if nothing wakes the executor after the future completes

Bug report

Steps to reproduce issue

Expected behavior

Actual behavior

Additional information

rclcpp
rclcpp copied to clipboard