nodelet_core
nodelet_core copied to clipboard
Nodelets not tear down when nodelet manager receives SIGINT
Reposting this as a standalone issue here . Thanks @dseifert for reporting this).
To test this, prepare the following:
- add a destructor to nodelet_core/test_nodelet/src/plus.cpp that has a ROS_INFO (or higher) output
- Create this launch file:
<launch> <node pkg="nodelet" type="nodelet" name="nodelet_manager" args="manager" output="screen"/> <node pkg="nodelet" type="nodelet" name="n1" output="screen" args="load test_nodelet/Plus nodelet_manager" /> </launch>
Now, perform these tests:
- Test 1: run the launch file; use
ps xa | grep nodelet
to figure out the process ID of thenodelet load
command andkill -SIGINT
it ... you will see that the destructor is called- Test 2: run the launch file, use
ps xa | grep nodelet
to figure out the process ID of thenodelet manager
command andkill -SIGINT
it ... you will see that the destructor is NOT called
Confirmed. We have the server thread that must be properly shutdown in destructor (seems the recommended practice). By placing the log statement there we discovered that the destructor does not run while killing the application with CTRL-C.
I'm experiencing what I believe is a related issue in ros-melodic. With various camera drivers (realsense-ros, libuvc_ros, avt_camera, etc...), when using nodelet mangers with many nodelets in them, I am unable to relaunch pipelines on an already started roscore. So this does not work:
- roscore
- In second terminal session, roslaunch my pipeline which has 10-20 nodelets in the nodelet manager, including a camera nodelet.
- ctrl-c
- roslaunch pipeline second time. The second roslaunch always results in a hang or a crash (depending on the camera/pipeline). This seems to be because things aren't unloading properly. Killing and restarting the roscore allows me to run the pipeline again (once).
On the other hand, if I don't have a separate roscore running, I can kill/relaunch the pipeline without issue.
Is anyone aware of a workaround other than not having a separate roscore? Having to restart roscore can be problematic in larger distributed systems.
I see a similar issue to what jpapon describes.
I have a launch files that starts 350 nodes or nodelets. And if I do not restart the roscore when re-running the launch file I get problem with nodes (are nodelets) being killed because name duplication. But the different names should be in different namespace so that should not be an issue.
If I let the launch file start the roscore then it works without any problems.
rosnode list
does not show any nodes left after I have killed the launch file. And ps doesn not show any hanging processes. rosnode cleanup did not help.
I have the same issue with realsense2_camera (realsense-ros).
After running roslaunch realsense2_camera rs_camera.launch
,
The command rosnode kill /camera/realsense2_camera
triggers the nodelet's destructor while rosnode kill /camera/realsense2_camera_manager
and "Ctrl-C" do not.
Hi all, In my view this could be happening because when trying to shut down the nodelet
- We might be breaking the bond with ROS before the components are unloaded
- There is no ros::Time available to leave waits like the one for service calls to shutdown: https://github.com/ros/nodelet_core/blob/indigo-devel/nodelet/src/nodelet.cpp#L192
I think the second point would be easy to verify, by adding a ros::Time::shutdown();
to ensure that ros time users stop waiting.
Further explanation
Using this as reference: http://docs.ros.org/diamondback/api/roscpp/html/init_8cpp_source.html
Request shutdown sets a global variable inside the node class that should smoothly shut down a node in the next iteration of spin. (Line 140) That request is serviced later by the Poll Manager object of the node, in an asynchronous manner.
Shutdown actually shuts down the queues, time loggers and connections of a node. (Line 519)
However, shutdown might not advance to shut down time, which is required to stop nodes that use ros::Rate and use custom signal handlers and threads. In this case you have to manually shut down the time tracking instance.
In both cases there are recursive mutexes to prevent that multiple shutdown calls crash the node.
A possible work around would be to kill the nodelet manager. For instance, assume the following nodes are running but need to be restarted:
/stereo/stereo_cam_nodelet
/stereo/stereo_nodelet_manager
By running the command:
rosnode kill /stereo/stereo_nodelet_manager
all associated nodelets are killed and can then be restarted. Killing /stereo/stereo_cam_nodelet
first prevents to restart the nodelets however.