ros2_controllers icon indicating copy to clipboard operation
ros2_controllers copied to clipboard

High CPU usage by gzserver with JTC after first action server request.

Open arshadlab opened this issue 3 years ago • 0 comments

Describe the bug I came across high CPU usage for gzserver with JTC action server (e.g moveit2). Action server creates a wall timer to execute and keep track on ongoing goal. The issue is that there always be a running action server timer in the background after receiving first action request which causes gzserver CPU usage very high. At times there will two timers running at same time.
I understand that all expired pointers get removed/destroyed in create_wall_timer() call. However the assignment is done to smart pointer after the timer creation call returns and thus remove_if call will miss chance to destroy timer entry since smart pointer not yet expired leading to two timers in callgroup list. The older one (though doing nothing) and newer one for current action task. The other issue is that even the goal is succeeded, the timer still running (but doing nothing) and will get destroyed on next action request. This empty running timer do cause high cpu usage for gzserver. I did try some changes that gave desired results. Change 1: Call a reset on smart pointer before creating wall timer so that old timer entry can be destroyed.

 void JointTrajectoryController::feedback_setup_callback(
 std::shared_ptr<rclcpp_action::ServerGoalHandle<FollowJTrajAction>> goal_handle)
 {
    ...
    goal_handle_timer_.reset();
    // Setup goal status checking timer
    goal_handle_timer_ = node_->create_wall_timer(..)
    ...
 }

Change 2: If goal is no more active then cancel the timer and do a reset to avoid subsequent cancel calls.

  controller_interface::return_type JointTrajectoryController::update()
  {
      ...
      if (active_goal)
           {
	      bool cancel_timer = false;
	      ....
	      setAborted() or setSucceeded() set cancel_timer=true
	      
	      // if goal is succeeded or aborted then cancel the timer too and set pointer to null.
	      if(cancel_timer && goal_handle_timer_) {
	            goal_handle_timer_->cancel();
	            goal_handle_timer_.reset();
	      }
           }
         ...
  }

These changes may require to run under a lock guard to ensure thread safety. I tried them and so far the results are good and gzserver cpu goes back to starting one after the joint trajectory action task finishes.

To Reproduce Steps to reproduce the behavior: Trigger JTC action server request and after that gazebo server cpu usage spikes and never come down.

Expected behavior gzserver cpu usage should go down after action server is completed. Scenario: Two UR5 arms + Turtlebot AMR System: Intel TGL

CPU usage before call = ~160% CPU usage during call = ~280% CPU usage after action server done = ~160%

Actual behavior CPU usage before call = ~160% CPU usage during call = ~280% CPU usage after action server done = ~280%

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Ubuntu 20.04
  • Version [e.g. Foxy]
  • Tested with Foxy
  • Anything that may be unusual about your environment

Additional context Add any other context about the problem here, especially include any modifications to any ros2_controllers that relate to this issue.

arshadlab avatar Aug 17 '22 08:08 arshadlab