rcl icon indicating copy to clipboard operation
rcl copied to clipboard

Reduce result_timeout to 10 seconds.

Open clalancette opened this issue 3 years ago • 4 comments

Signed-off-by: Chris Lalancette [email protected]

I spent a bit of time investigating https://github.com/ros2/ros2/issues/1074 . One problem I found is that if an action client calls an action server repeatedly, the time it takes to complete that action gets slower and slower, eventually ending up taking a good portion of a second.

After doing some debugging, I found out that at least part of the problem is that the number of goal handles (as stored by the rcl_action_server_t structure) continually goes up. Since rclcpp_action goes and copies all of the goal handles during publish_status, this causes the amount of time necessary to call publish_status to go up and up.

But looking at it further, it seems like the main problem is that goal_handles are never being cleared out when they are done being used. Instead, it looks like they are meant to "expire", but this only happens by default every 15 minutes. Quite a lot of goal handles can be accumulated in that time.

This PR is a not-serious attempt to remedy this by reducing the default expiration time to 10 seconds. With this in place, things still end up going slower for 10 seconds when first starting up, but then the expiration logic kicks in and keeps the number of goal handles to a reasonable number. I don't really intend this to go in as-is, but I would like to have a discussion about whether we should remove goals automatically once goal_handle->succeed is called, whether publish_status should really publish status on all goals, and whether we should reduce the default time expiration period for action goal handles.

clalancette avatar Oct 19 '22 15:10 clalancette

After some discussion, this may be a reasonable thing to do. So I'm going to go ahead and open this as a PR for real feedback.

clalancette avatar Nov 03 '22 17:11 clalancette

@ros-pull-request-builder retest this please

clalancette avatar Nov 03 '22 17:11 clalancette

@ros-pull-request-builder retest this please

clalancette avatar Nov 17 '22 14:11 clalancette

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

clalancette avatar Nov 17 '22 14:11 clalancette

https://ci.ros2.org/job/ci_windows/18331/testReport/junit/(root)/projectroot/test_publisher_subscriber_cpp__rmw_fastrtps_cpp__BoundedPlainSequences/ windows failure is unrelated.

fujitatomoya avatar Nov 30 '22 01:11 fujitatomoya

@clalancette do we want to backport this to humble?

fujitatomoya avatar Nov 30 '22 01:11 fujitatomoya

@clalancette do we want to backport this to humble?

I'm worried that this is a big semantic change, and it is really hard to debug. I'd rather let it soak in Rolling for a while before we even consider that.

clalancette avatar Nov 30 '22 13:11 clalancette

By the way, this caused alot of our tests to become flaky that we're not just realizing: https://github.com/ros-planning/navigation2/issues/3765

I believe this should be set to max and/or disabled by default since its very common for actions to be very long running. I think the method for expiring of goals so they don't pile up needs to be done another way -- its throwing out currently running goals now and returning statuses of UNKNOWN back to the application when we have things running > 10 s.

SteveMacenski avatar Aug 24 '23 18:08 SteveMacenski