[Bug] Need to adjust querier timeout for ROS 2 Action examples
Describe the bug
ROS 2 action is a combination of several topics and services.
There is a service called "action_name" + /_action/get_result, which will reply the result of ROS 2 action.
In the ROS 2 action popular example fibonacci_action, it will run about 10 sec, but the default timeout of querier is 5 sec.
https://github.com/eclipse-zenoh/zenoh-plugin-ros2dds/blob/de8acf5ca7a608c0b797b6490194c659d454899f/DEFAULT_CONFIG.json5#L136
This will cause the action client example keep waiting and never finish.
I think we can do both:
- Increase the default timeout for querier
- Notify the users that they might need to adjust the timeout.
To reproduce
# terminal 1
./target/debug/zenoh-bridge-ros2dds -d 1 -l tcp/127.0.0.1:7777
# terminal 2
./target/debug/zenoh-bridge-ros2dds -d 2 -e tcp/127.0.0.1:7777
# terminal 3
ROS_DOMAIN_ID=1 ros2 run action_tutorials_cpp fibonacci_action_client
# terminal 4
ROS_DOMAIN_ID=2 ros2 run action_tutorials_cpp fibonacci_action_server
System info
- Ubuntu 22.04
- ROS 2 Humble
The question of the default timeout value for Services and Actions calls is a tricky one, as ROS 2 doesn't manage any timeout on such calls. But a timeout is required for any Zenoh queries.
I think that in 5 seconds is already large for all the Service calls. Only Actions might take longer (for instance a mission to make a robot to follow a navigation path can take several minutes or even hours).
The design of an Action is actually asynchronous with 2 distinct Services calls:
_action/send_goalto initialise the action - the Action Server shall quickly reply either accepting or rejecting the goal_action/get_resultto get the result - if this is called only once the action succeeded, the reply will be quick. Otherwise the service call is blocking.
The problem with the current implementation of Action Client in rclcpp is that if a result_callback is passed in when calling async_send_goal() the GoalHandle is made result_aware and the _action/get_result service is directly called, without waiting the goal to finish. Hence the timeout on long actions.
Anyway, I think the default timeout for all Service calls must be kept relatively small for a "failfast" behaviour, especially at launch time where detecting a unresponsive Service is important. 5 seconds is appropriate in my opinion.
However I agree we can increase the default timeout for Actions' get_result. But what's the appropriate timeout for most of the Actions ? 30 seconds ? 5 minutes ? 1 hour ?
I agree with you that we should only increase the default timeout for get_result. 5 seconds should fit most of the scenarios for ROS 2 service. Regarding the appropriate timeout, perhaps we can set 30 seconds for the time being. At least this can make the example work and not confuse users.
Updated: After thinking twice, I think we should have a longer timeout to fit most scenarios. I picked 5 minutes, while 30 secs might not fit the navigation requirements and 1 hour is too long for most cases.
~~Besides this, a warning on the timeout might be a good plus for users to know what happened.~~ My bad, I missed the warning log...