nodelet_core icon indicating copy to clipboard operation
nodelet_core copied to clipboard

Nodelets not tear down when nodelet manager receives SIGINT

Open mikaelarguedas opened this issue 7 years ago • 6 comments

Reposting this as a standalone issue here . Thanks @dseifert for reporting this).

To test this, prepare the following:

  1. add a destructor to nodelet_core/test_nodelet/src/plus.cpp that has a ROS_INFO (or higher) output
  2. Create this launch file:
<launch>
 <node pkg="nodelet" type="nodelet" name="nodelet_manager"  args="manager" output="screen"/>

 <node pkg="nodelet" type="nodelet" name="n1" output="screen" args="load test_nodelet/Plus nodelet_manager" />
</launch>

Now, perform these tests:

  • Test 1: run the launch file; use ps xa | grep nodelet to figure out the process ID of the nodelet load command and kill -SIGINT it ... you will see that the destructor is called
  • Test 2: run the launch file, use ps xa | grep nodelet to figure out the process ID of the nodelet manager command and kill -SIGINT it ... you will see that the destructor is NOT called

mikaelarguedas avatar Mar 09 '17 13:03 mikaelarguedas

Confirmed. We have the server thread that must be properly shutdown in destructor (seems the recommended practice). By placing the log statement there we discovered that the destructor does not run while killing the application with CTRL-C.

andviane avatar Feb 04 '19 16:02 andviane

I'm experiencing what I believe is a related issue in ros-melodic. With various camera drivers (realsense-ros, libuvc_ros, avt_camera, etc...), when using nodelet mangers with many nodelets in them, I am unable to relaunch pipelines on an already started roscore. So this does not work:

  1. roscore
  2. In second terminal session, roslaunch my pipeline which has 10-20 nodelets in the nodelet manager, including a camera nodelet.
  3. ctrl-c
  4. roslaunch pipeline second time. The second roslaunch always results in a hang or a crash (depending on the camera/pipeline). This seems to be because things aren't unloading properly. Killing and restarting the roscore allows me to run the pipeline again (once).

On the other hand, if I don't have a separate roscore running, I can kill/relaunch the pipeline without issue.

Is anyone aware of a workaround other than not having a separate roscore? Having to restart roscore can be problematic in larger distributed systems.

jpapon avatar Sep 21 '19 22:09 jpapon

I see a similar issue to what jpapon describes.

I have a launch files that starts 350 nodes or nodelets. And if I do not restart the roscore when re-running the launch file I get problem with nodes (are nodelets) being killed because name duplication. But the different names should be in different namespace so that should not be an issue.

If I let the launch file start the roscore then it works without any problems.

rosnode list

does not show any nodes left after I have killed the launch file. And ps doesn not show any hanging processes. rosnode cleanup did not help.

tompe17 avatar Dec 02 '19 16:12 tompe17

I have the same issue with realsense2_camera (realsense-ros). After running roslaunch realsense2_camera rs_camera.launch, The command rosnode kill /camera/realsense2_camera triggers the nodelet's destructor while rosnode kill /camera/realsense2_camera_manager and "Ctrl-C" do not.

doronhi avatar Jan 01 '20 08:01 doronhi

Hi all, In my view this could be happening because when trying to shut down the nodelet

  1. We might be breaking the bond with ROS before the components are unloaded
  2. There is no ros::Time available to leave waits like the one for service calls to shutdown: https://github.com/ros/nodelet_core/blob/indigo-devel/nodelet/src/nodelet.cpp#L192

I think the second point would be easy to verify, by adding a ros::Time::shutdown(); to ensure that ros time users stop waiting.

Further explanation

Using this as reference: http://docs.ros.org/diamondback/api/roscpp/html/init_8cpp_source.html

Request shutdown sets a global variable inside the node class that should smoothly shut down a node in the next iteration of spin. (Line 140) That request is serviced later by the Poll Manager object of the node, in an asynchronous manner.

Shutdown actually shuts down the queues, time loggers and connections of a node. (Line 519)

However, shutdown might not advance to shut down time, which is required to stop nodes that use ros::Rate and use custom signal handlers and threads. In this case you have to manually shut down the time tracking instance.

In both cases there are recursive mutexes to prevent that multiple shutdown calls crash the node.

YoshuaNava avatar Jul 29 '20 11:07 YoshuaNava

A possible work around would be to kill the nodelet manager. For instance, assume the following nodes are running but need to be restarted:

/stereo/stereo_cam_nodelet
/stereo/stereo_nodelet_manager

By running the command:

rosnode kill /stereo/stereo_nodelet_manager

all associated nodelets are killed and can then be restarted. Killing /stereo/stereo_cam_nodelet first prevents to restart the nodelets however.

LucasWaelti avatar Feb 15 '21 10:02 LucasWaelti