actionlib Action client not receiving result when server publishes too quickly

I've made a post of this in the ROS community and have not received any answers so I think this may be a bug in ROS. Here is my original post. For convenience, I have copied it here.

Important Info

Client hangs randomly in a call to waitForResult() (no guarantee that it will happen)
Hanging is influenced by placement of calls to sleep()
Recording the topics indicated that the hanging happens when waitForResult() is called before the result is published.
I am using Ubuntu 14.04 and ROS Indigo.
Inside the call to waitForResult() the client's state is active. I.e. (cur_simple_state_ == SimpleGoalState::Active)

I am debugging a problem where an action client will hang while waiting for a result from the action server. After a lot of experimenting, I've narrowed it down to some sort of synchronization issue. One, which is influenced by placing sleep statements in different locations.

In the node which sends the goal, I have the following code:

      action_client_.sendGoal(goal);
      // ros::Duration(0.5).sleep();  // Enabling this line increases the probability to hang. Though, it is random.
      ROS_INFO("Waiting for result");
      action_client_.waitForResult();
      ROS_INFO("Result received");

In the action server node, I have the following code:

      // ros::Duration(0.001).sleep();  // Enabling this line significantly decreases the likelyhood of hanging
      ROS_INFO("Sending result");
      action_server_.setSucceeded(result_);
      ROS_INFO("Result sent");

Possible Sources of Error

OS scheduling - The OS could be putting the client thread to sleep after (or inside) the call to sendGoal() and then the action server is able to finish before the client thread can call waitForResult()
Bad time stamp - The time stamp for the result message could be incorrectly set. It appears as though the time stamp is evaluated while determining if a result was received.
Resource deadlock - The backtrace from GDB shows that the client is waiting on a mutex. Perhaps a race condition has created a deadlock.

GDB Backtrace in Client when hanging

    #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
        at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
    #1  0x00007ffff71fc0e9 in bool boost::condition_variable_any::do_wait_until<boost::unique_lock<boost::mutex> >(boost::unique_lock<boost::mutex>&, timespec const&) ()
       from /home/avidbots/Dev/avidbots/ros_indigo/devel/lib/libavidbots_intra_sector_planner.so
    #2  0x00007ffff71f4403 in bool boost::condition_variable_any::timed_wait<boost::unique_lock<boost::mutex> >(boost::unique_lock<boost::mutex>&, boost::posix_time::ptime const&) ()
       from /home/avidbots/Dev/avidbots/ros_indigo/devel/lib/libavidbots_intra_sector_planner.so
    #3  0x00007ffff71e9090 in bool boost::condition_variable_any::timed_wait<boost::unique_lock<boost::mutex>, boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000l> >(boost::unique_lock<boost::mutex>&, boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000l> const&) ()
       from /home/avidbots/Dev/avidbots/ros_indigo/devel/lib/libavidbots_intra_sector_planner.so
    #4  0x00007ffff7259ddb in actionlib::SimpleActionClient<avidbots_connection_planner::ConnectionPlannerAction_<std::allocator<void> > >::waitForResult(ros::Duration const&) ()
       from /home/avidbots/Dev/avidbots/ros_indigo/devel/lib/libavidbots_intra_sector_planner.so
    ---Type <return> to continue, or q <return> to quit---

Is this a bug in ROS, or am I doing something wrong?

Apr 06 '17 16:04 cpagravel

Does Issue #66 sound like it is the same issue?

Jul 09 '17 20:07 pbeeson

Yes. That issue looks like it is the same issue.

Jul 10 '17 12:07 cpagravel

actionlib actionlib copied to clipboard

Action client not receiving result when server publishes too quickly

Important Info

Possible Sources of Error

GDB Backtrace in Client when hanging

actionlib
actionlib copied to clipboard