beluga
beluga copied to clipboard
The nav2_amcl version present in the humble container crashes replaying the perfect odometry bagfile
Bug description
Running the perfect odometry bagfile against nav2 amcl in the Humble development container causes amcl to crash early in the bagfile.
This is an issue for us because this prevents performance measurements from being captured for this node, and therefore lacking any significant comparison baseline for beluga
.
Building nav2_amcl
from source with the source taken from a the current HEAD of navigation2's repository runs just fine, so it's probably an issue within amcl itself, but a relevant change has not been identified as the fix.
Platform (please complete the following information):
- OS: Humble development container.
-
Beluga
version: c0e2de1
How to reproduce
- Go to the beluga working copy
-
./docker/run --build
- Once in the container:
colcon build && . install/setup.bash && ros2 launch beluga_example perfect_odometry.launch.xml localization_package:=nav2_amcl localization_node:=amcl
Expected behavior
AMCL should be able to produce localization data from the bagfile replay.
Actual behavior
AMCL crashes. When capturing performance data, no data is captured at all.
Additional context
This issue has been present since early June or so, but it was initially thought to be caused by #232 and other problems in the bagfiles.
The stack dump of the crash can be obtained with:
sudo apt update && sudo apt install xterm -y
colcon build
. install/setup.bash
ros2 launch beluga_example perfect_odometry.launch.xml localization_package:=nav2_amcl localization_node:=amcl localization_prefix:="xterm -e gdb --args "
Once in xterm
start the amcl process with the run
command and wait until it crashes. Then use the back
command.
This is somehow related to the launchfiles being broken and loading the wrong param file in #254 . When the right param file is loaded, nav2_amcl works fine.
The failure may then be related to some value in the default param yaml file.
This might be related to https://github.com/Ekumen-OS/beluga/pull/238#issuecomment-1629892401. I think we agreed not to use Humble to benchmark.
Most likely, adaptive recovery is enabled in the default param yaml file, which causes the crash. We do need to fix the launch files.
This might be related to https://github.com/Ekumen-OS/beluga/pull/238#issuecomment-1629892401. I think we agreed not to use Humble to benchmark.
Thanks for the pointer. I had totally forgot about that. Fixing the launchfile should prevent this from happening for the benchmark runs at least.
For easier future reference this is the root issue that needs fixing in Humble:
- This is the original issue: https://github.com/ros-planning/navigation2/issues/3311
- PR fixing this in main: https://github.com/ros-planning/navigation2/pull/3315
- Rejected PR targetting Foxy with the same fix: https://github.com/ros-planning/navigation2/pull/3314
Ping the maintainer to backport https://github.com/ros-planning/navigation2/pull/3315#issuecomment-1790573075. We'll see when and if we get a response.
Backported https://github.com/ros-planning/navigation2/pull/3938 ! We should be in the clear in the next package sync.
Well, no, the Nav2 folks did not release in time for https://discourse.ros.org/t/preparing-for-humble-sync-2023-12-15/35091.
Have we observed this again @glpuga @nahueespinosa ? I think not.
No, we haven't, but then again we don't ever run the beluga_benchmark package anymore.
More broadly, we should reconsider whether that package is needed or if it should be phased out now that we use lambkin. The one thing that we might still use this for, but which we can do without just fine, is profiling.
beluga_benchmark
is getting replaced, so no need to track this external issue anymore.