depthai-ros icon indicating copy to clipboard operation
depthai-ros copied to clipboard

Segfault on mobile publisher

Open roni-kreinin opened this issue 3 years ago • 13 comments

ROS2 Galactic running on RPI4B. Same issue with both OAK-D-LITE and OAK-D-PRO.

[mobilenet_node-9] Stack trace (most recent call last) in thread 5512:
[mobilenet_node-9] #5    Object "[", at 0, in nil
[mobilenet_node-9] #4    Object "linux-vdso.so.1", at 0xffff8eed45bf, in 
[mobilenet_node-9] #3    Object "/usr/local/lib/libdepthai-core.so", at 0xffff8e01dbf3, in backward::SignalHandling::sig_handler(int, siginfo_t*, void*)
[mobilenet_node-9] #2    Object "/usr/local/lib/libdepthai-core.so", at 0xffff8e01db33, in backward::SignalHandling::handleSignal(int, siginfo_t*, void*)
[mobilenet_node-9] #1    Object "/usr/local/lib/libdepthai-core.so", at 0xffff8e01b387, in backward::StackTraceImpl<backward::system_tag::linux_tag>::load_here(unsigned long, void*, void*)
[mobilenet_node-9] #0    Object "/usr/local/lib/libdepthai-core.so", at 0xffff8e01e077, in unsigned long backward::details::unwind<backward::StackTraceImpl<backward::system_tag::linux_tag>::callback>(backward::StackTraceImpl<backward::system_tag::linux_tag>::callback, unsigned long)
[mobilenet_node-9] Segmentation fault (Address not mapped to object [(nil)])
[ERROR] [mobilenet_node-9]: process has died ...

and

[mobilenet_node-7] Stack trace (most recent call last) in thread 8077:
[mobilenet_node-7] #17   Object "[0xffffffffffffffff]", at 0xffffffffffffffff, in
[mobilenet_node-7] #16   Object "/lib/aarch64-linux-gnu/libc.so.6", at 0xffff8861f67b, in
[mobilenet_node-7] #15   Object "/lib/aarch64-linux-gnu/libpthread.so.0", at 0xffff884d64fb, in
[mobilenet_node-7] #14   Object "/lib/aarch64-linux-gnu/libstdc++.so.6", at 0xffff887aefab, in
[mobilenet_node-7] #13   Object "/usr/local/lib/libdepthai-core.so", at 0xffff88f7fe5b, in
[mobilenet_node-7] #12   Object "/usr/local/lib/libdepthai-core.so", at 0xffff88f7ff7f, in
[mobilenet_node-7] #11   Object "/usr/local/lib/libdepthai-core.so", at 0xffff88f8004b, in
[mobilenet_node-7] #10   Object "/usr/local/lib/libdepthai-core.so", at 0xffff88f80297, in
[mobilenet_node-7] #9    Object "/usr/local/lib/libdepthai-core.so", at 0xffff88f8041b, in
[mobilenet_node-7] #8    Object "/usr/local/lib/libdepthai-core.so", at 0xffff88f7b4ab, in
[mobilenet_node-7] #7    Object "/usr/local/lib/libdepthai-core.so", at 0xffff891872b3, in dai::XLinkStream::write(std::vector<unsigned char, std::allocator<unsigned char> > const&)
[mobilenet_node-7] #6    Object "/usr/local/lib/libdepthai-core.so", at 0xffff8918720f, in dai::XLinkStream::write(unsigned char const*, unsigned long)
[mobilenet_node-7] #5    Object "/usr/local/lib/libdepthai-core.so", at 0xffff89187d2b, in dai::XLinkWriteError::XLinkWriteError(XLinkError_t, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
[mobilenet_node-7] #4    Object "/usr/local/lib/libdepthai-core.so", at 0xffff891e6303, in
[mobilenet_node-7] #3    Object "/usr/local/lib/libdepthai-core.so", at 0xffff89204ab7, in
[mobilenet_node-7] #2    Object "/usr/local/lib/libdepthai-core.so", at 0xffff8920442b, in
[mobilenet_node-7] #1    Object "/usr/local/lib/libdepthai-core.so", at 0xffff891ce6ab, in
[mobilenet_node-7] #0    Object "/lib/aarch64-linux-gnu/libc.so.6", at 0xffff885d31c4, in
[mobilenet_node-7] Segmentation fault (Address not mapped to object [(nil)])

Seems to occur if I launch rviz2 after the mobile publisher, but sometimes it occurs on its own.

roni-kreinin avatar Feb 02 '22 20:02 roni-kreinin

Looks like same error as #64 Trying to debug. Will get back on this.

saching13 avatar Feb 02 '22 20:02 saching13

@roni-kreinin can you share which version of depthai-core you are using ?

cc: @themarpe incase if you have an idea on this.

saching13 avatar Feb 02 '22 20:02 saching13

For OAK-D-LITE it is the main branch and for OAK-D-PRO the oak-d-pro_develop branch

roni-kreinin avatar Feb 02 '22 21:02 roni-kreinin

got it. crash was for both devices ? and Other examples work fine for you ?

saching13 avatar Feb 02 '22 21:02 saching13

I think I may have installed the oakd-pro drivers incorrectly so I will re-test that. I have tested the stereo node and that works without segfaulting.

roni-kreinin avatar Feb 02 '22 21:02 roni-kreinin

Tested again with oak-d-pro drivers installed correctly and using rqt_image_view I didn't have any issues. I have an RPLIDAR A1 also connected to the RPI4 and it seems that there are issues when I try to run both at the same time. Either the RPLIDAR node will fail or the mobile publisher node will segfault.

roni-kreinin avatar Feb 02 '22 22:02 roni-kreinin

If we can't reproduce on our end - a capture with rr debugger would also work. (https://docs.google.com/document/d/1YRmwZP3gjcHY3UUO06LAh421Ea6gY4eKBCqJwHvcaIs/edit) But we have to preferably compile with RelWithDebInfo instead, to aid in later debugging. (Will check if final size increase of such option, if it makes sense for it to be default)

themarpe avatar Feb 02 '22 22:02 themarpe

I can try to get a capture but in the meantime here are some of my findings:

If the OAK-D is the only device connected to the RPI then the mobilenet node runs fine for a while, although it will sometimes freeze and stop publishing images without showing any warning or error messages in the output. I notice that during this time the mobilenet node will use up 100% CPU on one core of the RPI.

image

When it is working normally the node uses about 25% CPU split among the 4 cores.

I am able to reproduce this issue by connecting my RPLIDAR to the PI while the mobilenet node is running (OAK-D is on USB 3.0 port, RPLIDAR on 2.0). I do not get the segfault in this case either. If I restart the mobilenet node, it will run fine for some time but will eventually segfault. If I launch the RPLIDAR node while mobilenet is working, the RPLIDAR node will fail and mobilenet will freeze again with the 100% CPU usage issue.

Running something like rgb_stereo_node.launch.py seems to work without crashing but the RPI struggles to publish the images at a decent rate and there is a significant delay in the images being updated. I also tested stereo.launch.py without the metric converter, point cloud, and rviz nodes. I get pretty good fps (~15-20) but there is still about a 0.5s delay in the image being updated.

I don't have any issues using both the OAK-D and RPLIDAR with the stereo or rgb_stereo nodes, so it seems like something in the mobilenet node is causing issues with the USB devices.

roni-kreinin avatar Feb 03 '22 19:02 roni-kreinin

Also, I don't think I am able to use rr on an RPI4.

roni-kreinin avatar Feb 03 '22 19:02 roni-kreinin

is the CPU consumption same on rgb_stereo_node.launch.py and stereo.launch.py ? and is this on main branch or OAK-D-PRO-galactic ?

saching13 avatar Feb 03 '22 19:02 saching13

Those both use about 30% CPU. And this is on main branch with OAK-D-LITE.

roni-kreinin avatar Feb 03 '22 19:02 roni-kreinin

Also, I don't think I am able to use rr on an RPI4.

True, didn't realize you are running on RPi - rrs aarch64 support isn't yet very good and non 64bit RPi OS runs the chip in ARMv7 mode anyway.

themarpe avatar Feb 03 '22 21:02 themarpe

On my host I'm seeing not much CPU resources in Foxy and noetic. But Galactic seems to take more resources. I will test on Raspberry Pi over the weekend and get back to you.

Foxy

image

Galactic

Screenshot from 2022-02-03 13-35-53

saching13 avatar Feb 03 '22 21:02 saching13