camera1394
camera1394 copied to clipboard
segmentation fault on nodelet close
I have the following launch file:
<launch>
<node pkg="nodelet" type="nodelet" name="camera_nodelet_manager" args="manager" output="screen">
<param name="num_worker_threads" value="20"/>
</node>
<node pkg="nodelet" type="nodelet" name="left" args="load camera1394/driver camera_nodelet_manager">
<param name="guid" value="00b09d0100cd6e9e"/>
<param name="bayer_pattern" value="rggb"/>
<param name="bayer_method" value="HQ"/> <!-- esto tiene que ser distinto de "" (por ej "HQ") si quiero rectificar por nodelet -->
<param name="use_ros_time" value="true"/>
<param name="reset_on_open" value="true"/>
</node>
<!--<node pkg="nodelet" type="nodelet" name="left_debayer" args="load image_proc/debayer /camera_nodelet_manager">
<remap from="image_raw" to="camera/image_raw"/>
<remap from="image_color" to="camera/image_color"/>
<remap from="image_mono" to="camera/image_mono"/>
</node> -->
<node pkg="nodelet" type="nodelet" name="left_rectify" args="load image_proc/rectify /camera_nodelet_manager">
<remap from="image_mono" to="camera/image_raw"/> <!-- remapear image_mono a image_raw, en caso de que el driver este haciendo el debayer solo -->
<remap from="camera_info" to="camera/camera_info"/>
<remap from="image_rect" to="camera/image_rect"/>
<param name="queue_size" value="50"/>
</node>
</launch>
And, when doing CTRL+C, sometimes I get:
process[left-2]: started with pid [9082]
process[left_rectify-3]: started with pid [9114]
[ WARN] [1404829063.551903876]: Comparing 00b09d0100cd6e9e to 00b09d0100cd6e9e
[ INFO] [1404829063.568476815]: Found camera with GUID 00b09d0100cd6e9e
[ INFO] [1404829063.568623447]: camera model: Point Grey Research Firefly MV FMVU-03MTC
[ WARN] [1404829063.586293968]: [HQ] Bayer decoding in the driver is DEPRECATED; image_proc decoding preferred.
[ INFO] [1404829063.591556576]: [00b09d0100cd6e9e] opened: 640x480_mono8, 30 fps, 400 Mb/s
[ INFO] [1404829063.627000717]: [00b09d0100cd6e9e] has trigger support
[ERROR] [1404829063.628724634]: Unknown trigger source: source_0
[ERROR] [1404829063.629587108]: [00b09d0100cd6e9e] feature initialization failure
[ INFO] [1404829063.629676503]: [00b09d0100cd6e9e] closing device
[ WARN] [1404829063.811004478]: Comparing 00b09d0100cd6e9e to 00b09d0100cd6e9e
[ INFO] [1404829063.828208301]: Found camera with GUID 00b09d0100cd6e9e
[ INFO] [1404829063.828267154]: camera model: Point Grey Research Firefly MV FMVU-03MTC
[ WARN] [1404829063.844065052]: [HQ] Bayer decoding in the driver is DEPRECATED; image_proc decoding preferred.
[ INFO] [1404829063.848904539]: [00b09d0100cd6e9e] opened: 640x480_mono8, 30 fps, 400 Mb/s
[ INFO] [1404829063.889797976]: using default calibration URL
[ INFO] [1404829063.890063598]: camera calibration URL: file:///home/v01d/.ros/camera_info/00b09d0100cd6e9e.yaml
^C[left_rectify-3] killing on exit
[left-2] killing on exit
[camera_nodelet_manager-1] killing on exit
*** Error in `/opt/ros/hydro/lib/nodelet/nodelet': corrupted double-linked list: 0x00007f2394006010 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x73f8e)[0x7f23c5e67f8e]
/usr/lib/libc.so.6(+0x7988e)[0x7f23c5e6d88e]
/usr/lib/libc.so.6(+0x79b5a)[0x7f23c5e6db5a]
/usr/lib/libc.so.6(+0x7a6a8)[0x7f23c5e6e6a8]
/usr/lib/libdc1394.so.22(dc1394_usb_capture_stop+0xf3)[0x7f239fdaa603]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN10camera139410Camera13945closeEv+0x47)[0x7f23a5580dbd]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17camera1394_driver16Camera1394Driver11closeCameraEv+0x1c0)[0x7f23a5537ce8]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17camera1394_driver16Camera1394Driver8shutdownEv+0x18)[0x7f23a553a00a]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17Camera1394NodeletD1Ev+0x365)[0x7f23a55316b1]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17Camera1394NodeletD0Ev+0x18)[0x7f23a55317ea]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN12class_loader11ClassLoader16onPluginDeletionIN7nodelet7NodeletEEEvPT_+0x51)[0x7f23c79025a1]
/opt/ros/hydro/lib/nodelet/nodelet(_ZN5boost6detail12shared_countD1Ev+0x39)[0x406889]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN5boost20ptr_container_detail24reversible_ptr_containerINS0_10map_configIN7nodelet14ManagedNodeletESt3mapISsPvSt4lessISsESaISt4pairIKSsS6_EEELb1EEENS_20heap_clone_allocatorEED2Ev+0x4b)[0x7f23c7902f7b]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN7nodelet6LoaderD1Ev+0x19)[0x7f23c78fbf09]
/opt/ros/hydro/lib/nodelet/nodelet(main+0x17b)[0x404ddb]
/usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f23c5e14000]
/opt/ros/hydro/lib/nodelet/nodelet[0x4060f4]
Are you using external triggering?
Not in this case. I am using strobe output, but I don't that should matter.
On Wed, Jul 9, 2014 at 12:58 PM, jack-oquin [email protected] wrote:
Are you using external triggering?
— Reply to this email directly or view it on GitHub https://github.com/ros-drivers/camera1394/issues/39#issuecomment-48494744 .
Would you mind running the same test with the ros-hydro-camera1394
package?
I am using hydro actually. El 10/07/2014 18:17, "jack-oquin" [email protected] escribió:
Would you mind running the same test with the ros-hydro-camera1394 package?
— Reply to this email directly or view it on GitHub https://github.com/ros-drivers/camera1394/issues/39#issuecomment-48665700 .
The backtrace implies you built it from source in the /home/v01d/coding/ros
workspace. Not true? The messages look like they come from the unreleased master
branch.
I wanted to know if the current released version (camera1394-1.9.5) has the same problem. One easy way to do that is:
sudo apt-get install ros-hydro-camera1394
source /opt/ros/hydro/setup.bash
roslaunch /your/camera/launch_file.launch
Or, you can check out the hydro-devel
branch from the source tree and test with that.
You are right, I didn't realize. I just tried the system installed version (1.9.4) and it also happens:
*** Error in `/opt/ros/hydro/lib/nodelet/nodelet': corrupted double-linked list: 0x00007f4910015570 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x73f8e)[0x7f494c369f8e]
/usr/lib/libc.so.6(+0x7988e)[0x7f494c36f88e]
/usr/lib/libc.so.6(+0x79b32)[0x7f494c36fb32]
/usr/lib/libc.so.6(+0x7a6a8)[0x7f494c3706a8]
/usr/lib/libdc1394.so.22(dc1394_usb_capture_stop+0xf3)[0x7f492e294603]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN10camera139410Camera13945closeEv+0x47)[0x7f4944414dbd]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17camera1394_driver16Camera1394Driver11closeCameraEv+0x1c0)[0x7f49443cbce8]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17camera1394_driver16Camera1394Driver8shutdownEv+0x18)[0x7f49443ce00a]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17Camera1394NodeletD1Ev+0x365)[0x7f49443c56b1]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17Camera1394NodeletD0Ev+0x18)[0x7f49443c57ea]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN12class_loader11ClassLoader16onPluginDeletionIN7nodelet7NodeletEEEvPT_+0x51)[0x7f494de045a1]
/opt/ros/hydro/lib/nodelet/nodelet(_ZN5boost6detail12shared_countD1Ev+0x39)[0x406889]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN5boost20ptr_container_detail24reversible_ptr_containerINS0_10map_configIN7nodelet14ManagedNodeletESt3mapISsPvSt4lessISsESaISt4pairIKSsS6_EEELb1EEENS_20heap_clone_allocatorEED2Ev+0x4b)[0x7f494de04f7b]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN7nodelet6LoaderD1Ev+0x19)[0x7f494ddfdf09]
/opt/ros/hydro/lib/nodelet/nodelet(main+0x17b)[0x404ddb]
/usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f494c316000]
/opt/ros/hydro/lib/nodelet/nodelet[0x4060f4]
======= Memory map: ========
00400000-0040f000 r-xp 00000000 08:01 134101 /opt/ros/hydro/lib/nodelet/nodelet
0060e000-0060f000 r--p 0000e000 08:01 134101 /opt/ros/hydro/lib/nodelet/nodelet
0060f000-00610000 rw-p 0000f000 08:01 134101 /opt/ros/hydro/lib/nodelet/nodelet
01c03000-01e97000 rw-p 00000000 00:00 0 [heap]
7f490b5e1000-7f490b5e6000 r-xp 00000000 08:01 3548859 /usr/lib/libXdmcp.so.6.0.0
...
You are still running /home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so
from your catkin overlay. It should say /opt/ros/hydro/lib/libcamera1394_nodelet.so
.
Sorry again. I have now removed everything from my workspace and it seems a bit more robust however, after many starts and stops of the launch file I get the same:
[left-2] killing on exit [camera_nodelet_manager-1] killing on exit *** Error in `/opt/ros/hydro/lib/nodelet/nodelet': corrupted double-linked list: 0x00007ff52c005850 *** shutting down processing monitor...
(no backtrace this time)
On Fri, Jul 11, 2014 at 4:06 PM, jack-oquin [email protected] wrote:
You are still running /home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so from your catkin overlay. It should say /opt/ros/hydro/lib/libcamera1394_nodelet.so.
— Reply to this email directly or view it on GitHub https://github.com/ros-drivers/camera1394/issues/39#issuecomment-48770311 .
This happens intermittently? Not every time?
It's always on exit, right?
Yes (to all questions)
On Sat, Jul 12, 2014 at 8:38 PM, jack-oquin [email protected] wrote:
This happens intermittently? Not every time?
It's always on exit, right?
— Reply to this email directly or view it on GitHub https://github.com/ros-drivers/camera1394/issues/39#issuecomment-48827094 .
Hard to solve this without more information.
Ok, maybe when I compile the driver from the git repo to address the offset issue I can compile with debug info and see if I can get a backtrace.
On Sat, Jul 12, 2014 at 9:23 PM, jack-oquin [email protected] wrote:
Hard to solve this without more information.
— Reply to this email directly or view it on GitHub https://github.com/ros-drivers/camera1394/issues/39#issuecomment-48827858 .
Good idea. Thanks for helping.
Hi, using hydro-devel the segmentation fault also happens. I was not able to get a better backtrace since it is difficult to reproduce using gdb. However, if you look at the previous backtrace you have a list of symbols and it appears to indicate that is triggered within libdc1394. I suspect either a bug in the library, or a misuse of its functions. I would suspect something regarding deinitialization and a race condition which may trigger a camera de deinitialized twice or something like that.
Hi, can you verify that the nodelet is correctly calling the destructors as expected?
I been having some difficulties using nodelets and it seems, after recent updates in several ROS core packages, things have improved. I added "std::cout" messages in destructors (ROS_INFO* does not work at such stage) to ensure my destructors are called and it now seems to be working much more robustly.
However, I have a roslaunch file which loads a nodelet manager where both the camera1394 nodelets (left and right) and my nodelets are loaded. My destructors were not being called whenever I CTRL+C'd. I now switched to using camera1394 nodes instead of nodelets and my destructors are being called correctly. It all seems to point that the nodelet manager dies due to camera1394 nodelets.
I see you have some ROS_INFO calls on destructors but as I said these will never be visible. Would you check if switching to std::cout makes the messages appear? Otherwise, I suspect something is not completely correct in how the nodelets are shut-down.
PS: I still get the double free from time to time
PS2: it is not just with nodelets. I switched to regular nodes since this crashing made the manager crash and so did all my own nodelets and they were not deinitialized correctly. Now this problem does not affect my nodelets but it is still present. I get the double free message sometimes at shutdown and some times it hangs there after this message and eventally ROS kills when it realizes it is not responding.
Knowing that it affects nodes, too, is helpful.
Is it possible to get a backtrace of that failure, as well?
Hi, I'm back at it. I've got a backtrace now. I don't have line numbers for some reason (it is hard to re-compile the system package with debug information enabled) but I think it should be useful:
#0 0x00007ffff594de94 in free () from /usr/lib/libc.so.6
#1 0x00000000004d2d06 in _gnu_cxx::new_allocator
PS: I would guess that the driver is being polled after deinitialization has happened, since this occurs when CTRL+C-ing the node.
Maybe this is the offending line?
https://github.com/ros-drivers/camera1394/blob/master/src/nodes/driver1394.cpp#L395
It appears the close is called without locking the mutex and moreover the driver is closed before the CLOSED flag is set, therefore a call to publish() may be issued while the driver is being closed, right?
Sounds plausible, I'll take look at it.
Thanks for doing all this work!
any news on this? I'm still suffering this crash very often