beluga icon indicating copy to clipboard operation
beluga copied to clipboard

The nav2_amcl version present in the humble container crashes replaying the perfect odometry bagfile

Open glpuga opened this issue 1 year ago • 7 comments

Bug description

Running the perfect odometry bagfile against nav2 amcl in the Humble development container causes amcl to crash early in the bagfile.

This is an issue for us because this prevents performance measurements from being captured for this node, and therefore lacking any significant comparison baseline for beluga.

Building nav2_amcl from source with the source taken from a the current HEAD of navigation2's repository runs just fine, so it's probably an issue within amcl itself, but a relevant change has not been identified as the fix.

Platform (please complete the following information):

  • OS: Humble development container.
  • Beluga version: c0e2de1

How to reproduce

  1. Go to the beluga working copy
  2. ./docker/run --build
  3. Once in the container: colcon build && . install/setup.bash && ros2 launch beluga_example perfect_odometry.launch.xml localization_package:=nav2_amcl localization_node:=amcl

Expected behavior

AMCL should be able to produce localization data from the bagfile replay.

Actual behavior

AMCL crashes. When capturing performance data, no data is captured at all.

Additional context

This issue has been present since early June or so, but it was initially thought to be caused by #232 and other problems in the bagfiles.

The stack dump of the crash can be obtained with:

sudo apt update && sudo apt install xterm -y
colcon build
. install/setup.bash
ros2 launch beluga_example perfect_odometry.launch.xml localization_package:=nav2_amcl localization_node:=amcl localization_prefix:="xterm -e gdb --args "

Once in xterm start the amcl process with the run command and wait until it crashes. Then use the back command.

Screenshot from 2023-09-02 18-36-45

glpuga avatar Sep 02 '23 21:09 glpuga

This is somehow related to the launchfiles being broken and loading the wrong param file in #254 . When the right param file is loaded, nav2_amcl works fine.

The failure may then be related to some value in the default param yaml file.

glpuga avatar Sep 03 '23 17:09 glpuga

This might be related to https://github.com/Ekumen-OS/beluga/pull/238#issuecomment-1629892401. I think we agreed not to use Humble to benchmark.

Most likely, adaptive recovery is enabled in the default param yaml file, which causes the crash. We do need to fix the launch files.

nahueespinosa avatar Sep 04 '23 13:09 nahueespinosa

This might be related to https://github.com/Ekumen-OS/beluga/pull/238#issuecomment-1629892401. I think we agreed not to use Humble to benchmark.

Thanks for the pointer. I had totally forgot about that. Fixing the launchfile should prevent this from happening for the benchmark runs at least.

glpuga avatar Sep 04 '23 14:09 glpuga

For easier future reference this is the root issue that needs fixing in Humble:

  • This is the original issue: https://github.com/ros-planning/navigation2/issues/3311
  • PR fixing this in main: https://github.com/ros-planning/navigation2/pull/3315
  • Rejected PR targetting Foxy with the same fix: https://github.com/ros-planning/navigation2/pull/3314

glpuga avatar Oct 19 '23 17:10 glpuga

Ping the maintainer to backport https://github.com/ros-planning/navigation2/pull/3315#issuecomment-1790573075. We'll see when and if we get a response.

hidmic avatar Nov 02 '23 11:11 hidmic

Backported https://github.com/ros-planning/navigation2/pull/3938 ! We should be in the clear in the next package sync.

hidmic avatar Nov 04 '23 15:11 hidmic

Well, no, the Nav2 folks did not release in time for https://discourse.ros.org/t/preparing-for-humble-sync-2023-12-15/35091.

hidmic avatar Feb 09 '24 21:02 hidmic

Have we observed this again @glpuga @nahueespinosa ? I think not.

hidmic avatar Dec 06 '24 13:12 hidmic

No, we haven't, but then again we don't ever run the beluga_benchmark package anymore.

More broadly, we should reconsider whether that package is needed or if it should be phased out now that we use lambkin. The one thing that we might still use this for, but which we can do without just fine, is profiling.

glpuga avatar Dec 06 '24 19:12 glpuga

beluga_benchmark is getting replaced, so no need to track this external issue anymore.

hidmic avatar Dec 06 '24 20:12 hidmic