actionlib
actionlib copied to clipboard
Action client not receiving result when server publishes too quickly
I've made a post of this in the ROS community and have not received any answers so I think this may be a bug in ROS. Here is my original post. For convenience, I have copied it here.
Important Info
-
Client hangs randomly in a call to
waitForResult()
(no guarantee that it will happen) -
Hanging is influenced by placement of calls to
sleep()
-
Recording the topics indicated that the hanging happens when
waitForResult()
is called before the result is published. -
I am using Ubuntu 14.04 and ROS Indigo.
-
Inside the call to
waitForResult()
the client's state is active. I.e.(cur_simple_state_ == SimpleGoalState::Active)
I am debugging a problem where an action client will hang while waiting for a result from the action server. After a lot of experimenting, I've narrowed it down to some sort of synchronization issue. One, which is influenced by placing sleep statements in different locations.
In the node which sends the goal, I have the following code:
action_client_.sendGoal(goal);
// ros::Duration(0.5).sleep(); // Enabling this line increases the probability to hang. Though, it is random.
ROS_INFO("Waiting for result");
action_client_.waitForResult();
ROS_INFO("Result received");
In the action server node, I have the following code:
// ros::Duration(0.001).sleep(); // Enabling this line significantly decreases the likelyhood of hanging
ROS_INFO("Sending result");
action_server_.setSucceeded(result_);
ROS_INFO("Result sent");
Possible Sources of Error
-
OS scheduling - The OS could be putting the client thread to sleep after (or inside) the call to
sendGoal()
and then the action server is able to finish before the client thread can callwaitForResult()
-
Bad time stamp - The time stamp for the result message could be incorrectly set. It appears as though the time stamp is evaluated while determining if a result was received.
-
Resource deadlock - The backtrace from GDB shows that the client is waiting on a mutex. Perhaps a race condition has created a deadlock.
GDB Backtrace in Client when hanging
#0 pthread_cond_timedwait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007ffff71fc0e9 in bool boost::condition_variable_any::do_wait_until<boost::unique_lock<boost::mutex> >(boost::unique_lock<boost::mutex>&, timespec const&) ()
from /home/avidbots/Dev/avidbots/ros_indigo/devel/lib/libavidbots_intra_sector_planner.so
#2 0x00007ffff71f4403 in bool boost::condition_variable_any::timed_wait<boost::unique_lock<boost::mutex> >(boost::unique_lock<boost::mutex>&, boost::posix_time::ptime const&) ()
from /home/avidbots/Dev/avidbots/ros_indigo/devel/lib/libavidbots_intra_sector_planner.so
#3 0x00007ffff71e9090 in bool boost::condition_variable_any::timed_wait<boost::unique_lock<boost::mutex>, boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000l> >(boost::unique_lock<boost::mutex>&, boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000l> const&) ()
from /home/avidbots/Dev/avidbots/ros_indigo/devel/lib/libavidbots_intra_sector_planner.so
#4 0x00007ffff7259ddb in actionlib::SimpleActionClient<avidbots_connection_planner::ConnectionPlannerAction_<std::allocator<void> > >::waitForResult(ros::Duration const&) ()
from /home/avidbots/Dev/avidbots/ros_indigo/devel/lib/libavidbots_intra_sector_planner.so
---Type <return> to continue, or q <return> to quit---
Is this a bug in ROS, or am I doing something wrong?
Does Issue #66 sound like it is the same issue?
Yes. That issue looks like it is the same issue.