studio
studio copied to clipboard
ROS2 node crash in CycloneDDS when connecting with Foxglove Studio
Description My ROS2 package crashes when running with FoxGlove
- Version: v1.9.0
- OS: Ubuntu 20.01
Steps To Reproduce
- Connecr 1 Intel RealSense D435 camera
- Run Intel RealSense ROS2 Wrapper
- Open Foxglove studio
Expected Behavior Foxglove can display published images
Actual Behavior Intel RealSense ROS2 Wrapper crashes a few seconds after FoxGlove is started.
Error: [cam-2] python3: /tmp/binarydeb/ros-galactic-cyclonedds-0.8.0/src/core/ddsi/src/ddsi_entity_index.c:411: entidx_lookup_proxy_reader_guid: Assertion `is_reader_entityid (guid->entityid)' failed. [ERROR] [cam-2]: process has died [pid 3848, exit code -6, cmd '/home/sh/ws/flir-wrapper_ws/install/flir_wrapper/lib/flir_wrapper/cam --ros-args -r __node:=cam_727 -r __ns:=/cam_727 --params-file /tmp/launch_params_5fvheszt'].
@BorisKontorovich This looks like an error with your ROS2 node and not an error from Studio. I'd recommend looking into why the node is crashing as this would be specific to the node's logic. Maybe open an issue with the authors of the Intel RealSense ROS2 Wrapper with your above trace.
Can you give me some ideas how I can try foxglove with a camera?
Anything?
Are you able to view raw data with ros2 topic echo /my_camera_topic or does that also fail? Does the issue only happen with Foxglove Studio?
@jhurliman Could you look into this as it appears to be DDS related? Is there a chance our RTPS implementation is generating some kind of id wrong, or doing something that triggers a bug in cyclonedds?
Looks like the assertion is here:
https://github.com/eclipse-cyclonedds/cyclonedds/blob/3c81d2856aa088e48d2e9f2365c53c2591ee6fdb/src/core/ddsi/src/ddsi_entity_index.c#L411
And the is_reader_entityid function:
https://github.com/eclipse-cyclonedds/cyclonedds/blob/3c81d2856aa088e48d2e9f2365c53c2591ee6fdb/src/core/ddsi/src/q_entity.c#L211-L221
I can't reproduce this issue yet, but I've found at least one bug related to DATA_FRAG handling (large payloads such as image topics). I'm looking into it now.
Although I haven't been able to reproduce this specific crash, I did find some issues with our protocol handling related to large messages while testing the ros2_realsense_camera node. They have been fixed in https://github.com/foxglove/rtps/pull/18 which will bubble up to a Studio fix soon.
I tried the settings in foxglove/rtps#18https://github.com/foxglove/rtps/pull/18 but it still runs into the same problem. [https://opengraph.githubassets.com/552c744bda0a11ded849ced698ac6ae02f8b70aa52b4c12dc0e02c5787c5ae03/foxglove/rtps/pull/18]https://github.com/foxglove/rtps/pull/18 Fixes for receiving large messages by jhurliman · Pull Request #18 · foxglove/rtpshttps://github.com/foxglove/rtps/pull/18 NOTE: Receiving large messages such as 1MB raw image frames may not be possible with the default Linux networking receive buffer size of 256KB, depending on CPU speed / contention / network speed /... github.com
Boris K.
From: John Hurliman @.> Sent: Monday, May 2, 2022 9:19 PM To: foxglove/studio @.> Cc: Boris Kontorovich @.>; Mention @.> Subject: Re: [foxglove/studio] Problem runnign Foxglove (Issue #3272)
Although I haven't been able to reproduce this specific crash, I did find some issues with our protocol handling related to large messages while testing the ros2_realsense_camera node. They have been fixed in foxglove/rtps#18https://github.com/foxglove/rtps/pull/18 which will bubble up to a Studio fix soon.
— Reply to this email directly, view it on GitHubhttps://github.com/foxglove/studio/issues/3272#issuecomment-1115523869, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEJYGVAHYGVXU6F5HXCJRN3VIB5KVANCNFSM5UPNIADQ. You are receiving this because you were mentioned.Message ID: @.***>
@BorisKontorovich the fixes made in the linked PR are needed as well. They will be included in the next release (1.10.0) which should go out on Monday May 9th.
@jhurliman I'm still having this issue on Foxglove Studio version 1.12.0
Playing a ROS2 bag and trying to use foxglove to visualize the camera images. ROS2 bag crashes with the following error
python3: /tmp/binarydeb/ros-galactic-cyclonedds-0.8.0/src/core/ddsi/src/ddsi_entity_index.c:411: entidx_lookup_proxy_reader_guid: Assertion `is_reader_entityid (guid->entityid)' failed.
using RViz2 for visualizing works with no issue. So i think the issue might be related to Foxglove rtps.
@jhurliman What's the status of this issue? Is it one that impacts anyone trying to view camera images on ROS2 systems? I'm not seeing other reports of problems from others using ROS2 with studio.
@defunctzombie I've been unsuccessful in root causing this issue so far, it seems to affect CycloneDDS under some circumstances that I can't reproduce.
@HemaZ could you provide a detailed set of steps we can run to reproduce this behavior? Including steps like the ubuntu version, the packages we should install, commands to run, and a sample bag file would all be helpful to add.
I get the same problem on Galactic (Ubuntu 20.04) using the Ouster driver: https://github.com/ros-drivers/ros2_ouster_drivers
Steps:
- Start driver and wait for it to connect to lidar
- open foxglove if not already opened and go on the 3D panel. Click on the points topic (pointcloud2) to display it on foxglove
- it should crash the Ouster driver
I managed to create a replicable scenario without the need for specific hardware such as a lidar. It uses the same Ouster lidar driver but it reads from a PCAP file instead of the actual sensor. The Ouster driver has a separate mode to run with recorded PCAP files but I have verified it still crashes and the same behaviour is present.
You will require a metadata file and a PCAP file both of which I got blocked from uploading here so here is this link instead: https://drive.google.com/drive/folders/1WXLTntiAHcr_mCFAgR7GjQl3VH5BR5dO?usp=sharing
Note: The recording file is only a bit over 1min long
Here is the steps to replicate:
- Clone the ouster driver repo in a computer with ROS2 (galactic recommended since the problem seems related to cyclonedds): https://github.com/ros-drivers/ros2_ouster_drivers
- Follow instructions at the end of the readme to setup the TinsDriver mode so it will read from the recording and not crash: https://github.com/ros-drivers/ros2_ouster_drivers#usage-with-tins-based-driver
- I have generated a metadata file for the sensor recorded that should be used instead of the default ones in the repo
- You might need to go to
sudo -sand then source ros2 and only then launch the node since the libtins requires sudo permissions or it will throw an error
- After changing the ethernet device, building and sourcing run it
ros2 launch ros2_ouster tins_driver_launch.py metadata_filepath:=/path/to/file/OS1_64_BH_metadata.json
- Start pcap replay. If not installed then
sudo apt-get install tcpreplay
sudo tcpreplay --intf1=[ethernet device] ouster_lidar_rec.pcap
- Verify that it is running and publishing the pointcloud correctly (with rviz2 or rqt or anything else)
- Open foxglove and open a 3D panel. Select the topic /points
- It should crash the ros2_ouster node
Just found that also replaying a rosbag and connecting foxglove with the ROS2 direct connect has the same effect.
Steps:
- Download the mcap file: https://drive.google.com/file/d/1tK8__H6LBs4wNCl-n5ns4sJ0zzV2Rb_Z/view?usp=sharing
- Make sure to install the mcap rosbag2 plugin (also happens for db3 files but the mcap is smaller to share):
sudo apt-get install ros-galactic-rosbag2-storage-mcap - Start replaying the mcap file:
ros2 bag play -s mcap -l cfs22_mar_track4_fast.mcap - Open foxglove, connect to native ROS2
- Open 3D panel and subscribe to the topic
/ouster/points - The rosbag2 player should crash
Sorry we've been dormant on this issue. For now we recommend using the rosbridge connection option when trying to connect to ROS2 robots. The rosbridge option tends to work more reliably across the different ros2 variants and also works with custom messages whereas the native ros2 connection in studio does not work with custom messages.
I'm trying to reproduce this issue given the latest instructions, and I don't see the crash but I do see error messages on the ros2 bag play side and Studio fails to subscribe to /ouster/points while other topics appear to work fine. I built ROS2 Galactic from source in a container using https://gist.github.com/jhurliman/d1ad9e2c78bb81adfd26606960551e7f, started playback of the provided MCAP file, connected the latest desktop release version of Studio, and when attempting to subscribe to /ouster/points the ros2 bag play terminal prints a series of errors that look like this:
1663089166.116827 [0] recvUC: malformed packet received from vendor 1.16 state parse:nackfrag <52545053 02010110 63e380f2 69edf702 c8770214 0e010c00 0110f9e5 d365ca5e @0x24 12013c00 00000000 031e0000 00000000 1c150000 00000000 00010000 ffffff7f> (note: maybe partially bswap'd) {{12,1,60},0,1e03,5404,0,256}
1663089166.116837 [0] recvUC: malformed packet received from vendor 1.16 state parse:nackfrag <52545053 02010110 63e380f2 69edf702 c8770214 0e010c00 0110f9e5 d365ca5e @0x24 12013c00 00000000 031e0000 00000000 1c150000 00000000 00010000 ffffff7f> (note: maybe partially bswap'd) {{12,1,60},0,1e03,5404,0,256}
1663089166.116854 [0] recvUC: malformed packet received from vendor 1.16 state parse:nackfrag <52545053 02010110 63e380f2 69edf702 c8770214 0e010c00 0110f9e5 d365ca5e @0x24 12013c00 00000000 031e0000 00000000 1c150000 00000000 00010000 ffffff7f> (note: maybe partially bswap'd) {{12,1,60},0,1e03,5404,0,256}
1663089166.116863 [0] recvUC: malformed packet received from vendor 1.16 state parse:nackfrag <52545053 02010110 63e380f2 69edf702 c8770214 0e010c00 0110f9e5 d365ca5e @0x24 12013c00 00000000 031e0000 00000000 1c150000 00000000 00010000 ffffff7f> (note: maybe partially bswap'd) {{12,1,60},0,1e03,5404,0,256}
While it's not a crash, this should be enough to go off for now.
Any Linux users running into this issue or other issues with the ROS2 native connection, please try following the sysctl tuning steps at https://github.com/foxglove/rtps#notes and report back if this improves the behavior.
Any Linux users running into this issue or other issues with the ROS2 native connection, please try following the
sysctltuning steps at https://github.com/foxglove/rtps#notes and report back if this improves the behavior.
Tried the steps in the link, still getting crash on ROS2 Galactic ubuntu 20.04 from ros2 bag play side.
[INFO] [1663148281.258157400] [rosbag2_storage]: Opened database 'kitti_2011_10_03_drive_0027_synced2_readme/kitti_2011_10_03_drive_0027_synced2_readme.db3' for READ_ONLY.
[INFO] [1663148281.259853570] [rosbag2_storage]: Opened database 'kitti_2011_10_03_drive_0027_synced2_readme/kitti_2011_10_03_drive_0027_synced2_readme.db3' for READ_ONLY.
python3: /tmp/binarydeb/ros-galactic-cyclonedds-0.8.0/src/core/ddsi/src/ddsi_entity_index.c:411: entidx_lookup_proxy_reader_guid: Assertion `is_reader_entityid (guid->entityid)' failed.
Aborted (core dumped)
The latest studio main branch includes a new version of our RTPS library (PR at https://github.com/foxglove/studio/pull/4438). Unfortunately, I haven't been able to consistently repro the crash in my environment so I can't confirm if it fixes the reported issue. Still, it does improve protocol compliance in a few places, so fingers crossed. If someone seeing this crash can test out a desktop build of main and report back it would be much appreciated.
The latest studio
mainbranch includes a new version of our RTPS library (PR at #4438). Unfortunately, I haven't been able to consistently repro the crash in my environment so I can't confirm if it fixes the reported issue. Still, it does improve protocol compliance in a few places, so fingers crossed. If someone seeing this crash can test out a desktop build ofmainand report back it would be much appreciated.
Can you point me to the latest main build? or how can I build it?
@HemaZ we don't publish continuous/nightly builds of the desktop app, only the web version. For desktop, you can follow the instructions at https://github.com/foxglove/studio#contributing to clone the repository and build from source.
@HemaZ Here's a dev build of latest main: https://drive.google.com/file/d/1mLkw_T3OfeZiX7E2XhRPvcKWmDHp7Sys/view?usp=sharing You can give this a try or wait until we make a new release next week and try that.
@HemaZ Here's a dev build of latest main: https://drive.google.com/file/d/1mLkw_T3OfeZiX7E2XhRPvcKWmDHp7Sys/view?usp=sharing You can give this a try or wait until we make a new release next week and try that.
Tried this release. still crashing.
Using the latest version of Studio 1.26.0.
I ran the above steps for ros2 bag play with the referenced cfs22_mar_track4_fast.mcap file on a Focal VM with the latest ros2 galactic ros2-testing packages (http://packages.ros.org/ros2-testing/ubuntu focal main). I was able to subscribe to /ouster/points and receive data. The ros2 bag play process did not crash.
@bertaveira Are you able to try again with the latest packages from ros2-testing? So far I've not been able to reproduce any crash using my VM.
@BorisKontorovich Are you still able to reproduce this issue with the latest release of Studio? Are you able to reproduce if you update your ROS galactic packages to the latest release from ros2-testing?
I was able to reproduce on a ubuntu linux machine.
I've moved this ticket to the @foxglove/rtps library where the bug exists, see https://github.com/foxglove/rtps/issues/22. We'll keep this Studio ticket open as well until the issue is resolved or the ROS 2 native connector is deprecated.
@jhurliman now that you can reproduce this are you able to create a fix?