gazebo-yarp-plugins
gazebo-yarp-plugins copied to clipboard
Robot not removed properly when using clock plugin
While performing some checks on the synchronization of Gazebo and Matlab through the ySynchronizer, which arbitrates this process by sending rpc commands to either stop or continue the simulation, I noticed that when removing the robot Gazebo gets stuck while attempting to close GazeboYarpIMU
with the following message:
*** GazeboYarpIMU closing ***
Closing Server Inertial...
This happens not only after I have used the ySynchronizer
block but also by just placing the robot and removing it without further interaction. Totally reproducible.
This error is not related to the IMU, but to any analog sensor. If you disable the IMU in the URDF, you will see that the simulation blocks trying to close jointvelocity sensors or forcetorque sensors (if any). The IMU is only the first one to be closed. I have been investigating this issue for 5 days now, and I managed to reduce the deadlock around the AnalogWrapper::Close() function. I have no idea what's down there, among RateThreads, ThreadImpl, PortCore, RpcPorts and so on, but one of the YARP components is deadlocked for sure. I can also add that even if the simulation is blocked, you can still yarp rpc /yourRobot/anyAnalogSensorRpcInterface and calibrate them, meaning that the communication thread is still running happily. @traversaro can we increase the scope and the gravity of this issue?
Nuno and Prashanth should be investigating this issue as well.
@prashanthr05 @nunoguedelha
Ok, I just found out the single line that produces deadlock: https://github.com/robotology/yarp/blob/master/src/libYARP_dev/src/modules/AnalogWrapper/AnalogWrapper.cpp#L469
If you comment it out, recompile yarp, recompile GazeboYarpPlugin, everything will work (except that you do not get timestamps in the envelope of the analog sensor data). If you need this fast, you can apply this small change and stop reading.
A not so good workaround (but give me more time and it will become good) is:
- turn m_jointsensorwrapper into a polydriver*
- substitute this line (m_jointsensorwrapper.close()): https://github.com/robotology/gazebo-yarp-plugins/blob/master/src/gazebo_plugins/JointSensors.cc#L33
with something like this:
boost::thread* closeThread= new boost::thread(boost::bind(&close_thread,m_jointsensorsWrapper));
where close_thread is a function that takes a polydriver* and calls close.
I am not posting code because I am still writing a merge that does not leave boost::thread* objects around without deleting them.
The changes above should be applied also to IMU and ForceTorque sensors.
@MirkoFerrati which version (checkout) of yarp are you using?
Some of the things that I think could be related:
- Unfortunatly the clock synchronization logic in yarp is undocumented. However, the best way of learning its logic (besides looking at the original issue: https://github.com/robotology/gazebo-yarp-plugins/issues/57 is reading the source code, for example the one of the
NetworkClock
class: https://github.com/robotology/yarp/blob/master/src/libYARP_OS/src/NetworkClock.cpp . Note that code inspection is greatly simpliefied if you use a proper tool (IDE or editor plugin) for following the code execution flow. - @paulfitz did some changes in the network clock synchronization code in yarp to solve a similar bug : https://github.com/robotology/yarp/issues/331 ) perhaps this bug could be related to those changes?
- @barbalberto recently merged some modifications of the analogWrapper, perhaps it can take a quick look if there is anything fishy going on around
analogWrapper::close()
- The original design idea of @paulfitz for clock synchronization was using a topic (persistent), but nobody had the time to properly test the use of a
/clock
topic over a/clock
port, see https://github.com/robotology/gazebo-yarp-plugins/issues/157 (Probably not related, but it has always worried me because I never totally get all the internals of clock synchronization)
cc @nunoguedelha @prashanthr05
My personal impression is that that the gazebo physics thread is calling analogWrapper::close()
during the plugin teardown. During the teardwon, something called by analogWrapper::close()
is calling a yarp::os::Time::delay()
, that however never exits because it waits for at least a clock tick (that triggers a NetworkClock::read()
). Unfortunately the clock is never updated, because it should be updated by the gazebo physics thread, that is the one calling the delay()
!
P.s. while I was writing this reply, I quickly checked all the delay()
calls in yarp codebase (easy if you use an IDE!) and I guess that there are some fishy calls in
the DgramTwoWaysStream
class interrupt
and close
methods:
https://github.com/robotology/yarp/blob/master/src/libYARP_OS/src/DgramTwoWayStream.cpp (however they are relativly old, so if this behavior is a regression they are probably not related).
@traversaro Inside analogWrapper::close()
there is a call to Stamp::Update()
, which in turn enters yarp::os::Time
and deadlocks, I don't know it the reason is delay()
or some different function ( I guess time::now()
directly is the responsible).
I am using an old Yarp version, along with an old GazeboYarpPlugin. I am not able of moving into master. I would appreciate if you can test this:
- comment out line https://github.com/robotology/yarp/blob/master/src/libYARP_dev/src/modules/AnalogWrapper/AnalogWrapper.cpp#L469 and try if everything works on master
Btw, I checked the implementation of ControlBoardWrapper in order to understand why the controlboards close correctly. The timestamp is also updated in the thread, but in a different way: the time is not returned from time::now
but from the encoder sensor structure (search for this timeStamp+=device.subdevices[k].encodersTimes[l];
)
, which is filled inside ControlBoardDriverEncoders in the getEncodersTimed
with the value updated from the ControlBoardDriver m_lastTimestamp.update(_info.simTime.Double());
Basically, ControlBoardWrapper uses simulated timestamp, AnalogWrapper does not.
Sorry, unable to test this at the moment. Which Stamp::update()
are you referring to? Then one in analogWrapper::run()
?
Yes, that one. I am testing my changes in a larger URDF model, there may be problems even with the ControlBoards, but they seem to raise only if another AnalogWrapper is closing. I will let you know.
Ok, thanks. Anyway if someone has some time to allocate for serious debugging, I guess that get a clear idea of the issue we should:
- Make sure that
yarp
andgazebo-yarp-plugins
are compiled with debug simbols (compile inDebug
orRelWithDebInfo
) and preferably debug symbols for gazebo are installed (via thegazebo*-dbg
packages). - Try to replicate this bug with only one AnalogWrapper (to reduce the number of threads)
- Get a
gdb
backtrace for all the threads, so we can understand the deadlock .
@MirkoFerrati I was thinking a bit about this, but are you sure that analogWrapper::run
is called by the thread that is calling analogWrapper::close()
? At which line of which file the run
method is called?
Another fishy difference between ControlBoardWrapper
and AnalogWrapper
is in the close, in the AnalogWrapper the stop
method is basically called two times:
if (RateThread::isRunning())
{
RateThread::stop();
}
RateThread::stop();
No no no, close()
calls join()
on the thread run()
. But if run()
is blocked waiting for a clock do unblock some yarp::os::Time primitives, the close()
is blocked too.
Anyway, my changes to Handler.cc are working with AnalogWrapper, but not with IMU. It looks like ServerInertial.h has a very different structure from AnalogWrapper :(
Edit: the double call to stop is a typo, but nothing important, sinche the first call already deadlocks. In my version, the call is done only once and does not deadlock.
Update: I have the ServerInertial closing fine, but I had to put an hack in its close function in order to tell our IMUDriver to stop reading from Gazebo.
How can I call the close()
method of a DeviceDriver from the ServerInertial that is using it as a IGenericSensor?
Also: can I make some changes to the ServerInertial.h inside Yarp? What's the policy?
Branch issue181 works as expected. I don't think we can merge the hack in master, but as soon as someone knows how to do the fix in the yarp way, it should be done! @traversaro will you take care of this? I for sure will propagate this fix in walk-man computers.
- Ah, I get what you meant. If the problem is relative to
ThreadImpl::join
, it could be related to https://github.com/robotology/yarp/issues/331 . - Yes, the ServerInertial device is quite different from the other wrapper + devices, It is quite old and quite counter intuitive. @barbalberto is refactoring that part of yarp, so I guess he can provide more information, also related to the
IGenericSensor
/IAnalogSensor
duality. Anyway the policy of changes in YARP is that you can propose a pull request, where the change will be discussed and eventually merge. If you want to discuss a change before opening the pull request, feel free to open an issue in https://github.com/robotology/yarp . - Yes, I would avoid merging all this hacks in master. I will be in Paris so I will not directly take care of this, but I guess other people in the lab will make sure to find a proper solution. If someone really need this bug fixed as soon as possible, he can always use the
issue181
.
TheadImpl::joint
gives problem also here: https://github.com/robotology/yarp/issues/469
I could reproduce this issue even without using the port /clock, without running Matlab/Simulink nor the synchronisation through the ySynchronizer
. I just run the YARP server and Gazebo with the iCub model and the verbose option ( gazebo --verbose /Users/nunoguedelha/dev/icub-gazebo/world/icub.world ). When removing the robot, Gazebo gets stuck while attempting to close GazeboYarpIMU
.
If this can be helpful, from the documentation:
void yarp::os::RateThread::stop ( )
Call this to stop the thread, this call blocks until the thread is terminated (and releaseThread() called).
Actually calls join. This will deadlock if called from run(), use askToStop() instead.
Not sure this is the problem you are talking about.
This code:
if (RateThread::isRunning()) { RateThread::stop(); }
RateThread::stop();
seems strange though.
The first stop() is the right one, the second one should be commented out. The deadlock raises anyway. What's the expected behavior of a RateThread when its destructor gets called while running? Should it stop? Should it wait for join()? std::thread calls std::terminate and aborts the application.
If I am not mistaken the destructor calls stop+join.If stops is called outside the thread I would say the problem is that the thread does not exit. Can you add a fprint(stderr) in the releaseThread() function?
I added couts in every destructor/stop/stopMain/join/joinImpl that was involved in the process.
The stop() function basically sets a variable closing
to true, and the thread loop is like while(!closing)
.
The problem is that the thread does never reach the end of the current loop because it is blocked by a call to yarp::Time, which is blocked by gazebo not sending time because the simulation is blocked waiting for the destructor to destroy the thread.
Thus, the thread will never check for the value of closing
, making the stop function useless.
Also, the difference between askToStop and Stop is that after setting closing
, Stop will call Join, which is called anyway by the destructor.
Relating the issue, when I try to simulate the robot in gazebo while running the ySynchronizer, I do not have any issue while removing the robot. The IMU and ForceTorque sensors are closed properly. The versions of codyco-superbuild (latest commit on Apr07) , yarp (latest commit on Apr08) and gazebo_yarp_plugins (latest commit on Mar23) are upto-date.
The output on the terminal looks like, in the image.
Although I have not looked deeply into the code of AnalogWrapper.cpp or the JointSensor.cc, I noticed that the changes suggested by @MirkoFerrati are not done i.e commenting the StateStamp or modifying the m_jointsensorwrapper.close().
@prashanthr05 I guess this is expected if you are not launching gazebo with the clock enabled (i.e. gazebo -slibgazebo_yarp_clock.so
) or you are launching gazebo without the YARP_CLOCK
enviromental variable set.
@nunoguedelha that seems a quite strange behaviour. Which version of yarp and gazebo are you using?
I've retested after updating all my repos to last master:
I don't have the issue if the variable YARP_CLOCK
is unset, whatever option is specified when launching gazebo
(so, with or without -s libgazebo_yarp_clock.so
).
I have the issue if I set YARP_CLOCK=/clock
, whatever option is specified when launching gazebo
(so, with or without -s libgazebo_yarp_clock.so
).
So the issue occurs as soon as you set YARP_CLOCK=/clock
, even if the clock plugin is not enabled (and /clock + /clock/rpc ports created).
@nunoguedelha This is expected, since the deadlock arises when trying to get an external clock which is not published. Notice that when a model is deleted from gazebo, the clock is no longer published, which is the same behavior as not running the libgazebo_yarp_clock plugin.
Just to let you know, we tried as you mentioned in https://github.com/robotology/gazebo-yarp-plugins/issues/181#issuecomment-89553000 commenting out the line containing lastStateStamp.update()
but the problem remains the same. We are using the master
branches at the current versions of both YARP and gazebo-yarp-plugins.
@jeljaik That test is outdated now. It could work on AnalogWrappers, but for sure not on ServerAnalog (which handles the IMU). If you are into testing, you can try commenting out the IMU plugin from your URDF and see what happens with the lastStateStamp.update() commented out. Anyway, since I am using an old version of YARP, there could be some more lines related to time which I missed in my version.
Another test is my branch issue181 which should solve all the problems even for latest YARP (at the cost of introducing some hacks in the code). https://github.com/robotology/gazebo-yarp-plugins/tree/issue181
Give it a try!
Thank you I'm into properly debugging this along with Nuno. Thanks for the hints. We'll keep you posted.
On Mon, Apr 13, 2015, 18:12 MirkoFerrati [email protected] wrote:
@jeljaik https://github.com/jeljaik That test is outdated now. It could work on AnalogWrappers, but for sure not on ServerAnalog (which handles the IMU). If you are into testing, you can try commenting out the IMU plugin from your URDF and see what happens with the lastStateStamp.update() commented out. Anyway, since I am using an old version of YARP, there could be some more lines related to time which I missed in my version.
Another test is my branch issue181 which should solve all the problems even for latest YARP (at the cost of introducing some hacks in the code). https://github.com/robotology/gazebo-yarp-plugins/tree/issue181
Give it a try!
— Reply to this email directly or view it on GitHub https://github.com/robotology/gazebo-yarp-plugins/issues/181#issuecomment-92413248 .