rclcpp icon indicating copy to clipboard operation
rclcpp copied to clipboard

ClientBaseImpl actually gets context_, enabling wait_for_action_server to work

Open joncppl opened this issue 2 years ago • 8 comments
trafficstars

ActionClient::wait_for_action_server never actually waits and always returns here, (when waiting is actually necessary). Seems to me that the action client's reference to context never gets set.

corrected link: https://github.com/ros2/rclcpp/blob/fd58bddb05ee430f7124f5feff0add683db6f86f/rclcpp_action/src/client.cpp#L192

joncppl avatar Aug 25 '23 20:08 joncppl

I think you're correct with this one, can you address the DCO signoff check?

CursedRock17 avatar Aug 25 '23 22:08 CursedRock17

ActionClient::wait_for_action_server never actually waits and always returns here

isn't that here? according to your fix.

https://github.com/ros2/rclcpp/blob/ba175922d3008704c67b10fe82a2f7c0cbee2be4/rclcpp_action/src/client.cpp#L192

https://github.com/ros2/rclcpp/blob/ba175922d3008704c67b10fe82a2f7c0cbee2be4/rclcpp/src/rclcpp/client.cpp#L154

seems to be no problem for me, since context_ is stored here?

https://github.com/ros2/rclcpp/blob/ba175922d3008704c67b10fe82a2f7c0cbee2be4/rclcpp/src/rclcpp/client.cpp#L43

fujitatomoya avatar Aug 25 '23 23:08 fujitatomoya

seems to be no problem for me, since context_ is stored here?

Isn't that just ClientBase for the regular client, not ClientBaseImpl for rclcpp_action? To be fair, when I set a basic action client/server system wait_for_action_server still waits like it should, but if I cancel the client before I connect to the server, rcl throws rcl node's context is invalid which I assume is from the client side.

Also there's an uncrustify error that needs to be fixed.

CursedRock17 avatar Aug 26 '23 17:08 CursedRock17

@joncppl The link you provided is to the rclcpp::ClientBase::wait_for_nanoseconds method, which as far as I can tell is never invoked during rclcpp_action::ActionClient::wait_for_action_server_nanoseconds. Or am I missing something?

Also, wait_for_action_server definitely waits at some points; if you run ros2 run action_tutorials_cpp fibonacci_action_client, it will wait around for the server for 10 seconds and quit if the server doesn't show up.

All of that said, it does seem like this patch fixes a problem. But I think we need to understand the problem a bit more first. Can you describe in detail the problem that caused you to look into this and submit the patch? Thanks.

clalancette avatar Aug 28 '23 14:08 clalancette

@joncppl The link you provided is to the rclcpp::ClientBase::wait_for_nanoseconds method, which as far as I can tell is never invoked during rclcpp_action::ActionClient::wait_for_action_server_nanoseconds. Or am I missing something?

Also, wait_for_action_server definitely waits at some points; if you run ros2 run action_tutorials_cpp fibonacci_action_client, it will wait around for the server for 10 seconds and quit if the server doesn't show up.

All of that said, it does seem like this patch fixes a problem. But I think we need to understand the problem a bit more first. Can you describe in detail the problem that caused you to look into this and submit the patch? Thanks.

My apologies, I had intended to link the rclcpp_action client's equivalent. The code is very similar..

environment is ubuntu20.04/foxy w/ connextdds.

test is:

  • spawn action server
  • immediately spawn action client (programatically in another process, leaving little time for discovery)
  • immediately call wait_for_action_server, which immediately returns false, regardless of the timeout value. debugger indicates it is returning from the line I linked.
  • with the patch, wait_for_action_server indeed waits for about 1 second before returning true
  • alternatively, calling action_server_is_ready in an busy loop also takes about 1 second before it also returns true.

It appears to me the root cause has not changed since foxy, but yes I have not tested the problem/patch on rolling.

joncppl avatar Aug 28 '23 21:08 joncppl

I believe the reason the example works is because rclcpp::ok falls back to default context since pimpl->context is null, so this condition is true despite the null context.

However my application is not using the default context.

joncppl avatar Aug 29 '23 00:08 joncppl

For example, it is replicated with this patch to example minimal_action_client

diff --git a/rclcpp/actions/minimal_action_client/member_functions.cpp b/rclcpp/actions/minimal_action_client/member_functions.cpp
index f5c827e..f863095 100644
--- a/rclcpp/actions/minimal_action_client/member_functions.cpp
+++ b/rclcpp/actions/minimal_action_client/member_functions.cpp
@@ -133,13 +133,21 @@ private:
 
 int main(int argc, char ** argv)
 {
-  rclcpp::init(argc, argv);
-  auto action_client = std::make_shared<MinimalActionClient>();
+  auto context = rclcpp::Context::make_shared();
+  context->init(argc, argv);
+  rclcpp::NodeOptions node_options {};
+  node_options.context(context);
+  auto action_client = std::make_shared<MinimalActionClient>(node_options);
+
+  rclcpp::ExecutorOptions executor_options {};
+  executor_options.context = context;
+  rclcpp::executors::SingleThreadedExecutor executor(executor_options);
+  executor.add_node(action_client);
 
   while (!action_client->is_goal_done()) {
-    rclcpp::spin_some(action_client);
+    executor.spin();
   }
 
-  rclcpp::shutdown();
+  context->shutdown("done");
   return 0;
 }

joncppl avatar Aug 29 '23 00:08 joncppl

However my application is not using the default context.

Fantastic, thanks. That is what I suspected, but I wanted to be sure.

Can you please add a test that shows that this change fixes the problem? You can look at https://github.com/ros2/rclcpp/blob/fd58bddb05ee430f7124f5feff0add683db6f86f/rclcpp_action/test/test_client.cpp#L357-L368 for inspiration (though you may not want to reuse that test fixture, as the setup for this test is going to be very different).

clalancette avatar Aug 29 '23 13:08 clalancette