navigation2 icon indicating copy to clipboard operation
navigation2 copied to clipboard

Migrate to AWS warehouse simulation scenario

Open lucabonamini opened this issue 3 years ago • 81 comments


Basic Info

Info Please fill out this column
Ticket(s) this addresses
Primary OS tested on Ubuntu
Robotic platform tested on Personal laptop

Description of contribution in a few bullet points

  • Changed tests simulation scenario from turtlebot world to AWS warehouse

Description of documentation updates required from your changes


Future work that may be required in bullet points

For Maintainers:

  • [ ] Check that any new parameters added are updated in navigation.ros.org
  • [ ] Check that any significant change is added to the migration guide
  • [ ] Check that any new features OR changes to existing behaviors are reflected in the tuning guide
  • [ ] Check that any new functions have Doxygen added
  • [ ] Check that any new features have test coverage
  • [ ] Check that any new plugins is added to the plugins page
  • [ ] If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists

lucabonamini avatar Feb 02 '22 20:02 lucabonamini

This pull request is in conflict. Could you fix it @lucabonamini?

mergify[bot] avatar Feb 02 '22 20:02 mergify[bot]

@lucabonamini, please properly fill in PR template in the future. @stevemacenski, use this instead.

  • [ ] Check that any new parameters added are updated in navigation.ros.org
  • [ ] Check that any significant change is added to the migration guide
  • [ ] Check that any new features OR changes to existing behaviors are reflected in the tuning guide
  • [ ] Check that any new functions have Doxygen added
  • [ ] Check that any new features have test coverage
  • [ ] Check that any new plugins is added to the plugins page
  • [ ] If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists

mergify[bot] avatar Feb 02 '22 20:02 mergify[bot]

@lucabonamini, your PR has failed to build. Please check CI outputs and resolve issues. You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

mergify[bot] avatar Feb 02 '22 20:02 mergify[bot]

@lucabonamini, your PR has failed to build. Please check CI outputs and resolve issues. You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

mergify[bot] avatar Feb 02 '22 20:02 mergify[bot]

@lucabonamini, your PR has failed to build. Please check CI outputs and resolve issues. You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

mergify[bot] avatar Feb 02 '22 20:02 mergify[bot]

@lucabonamini, your PR has failed to build. Please check CI outputs and resolve issues. You may need to rebase or pull in main due to API changes (or your contribution genuinely fails).

mergify[bot] avatar Feb 05 '22 17:02 mergify[bot]

@lucabonamini what's the status here?

SteveMacenski avatar Mar 24 '22 00:03 SteveMacenski

This pull request is in conflict. Could you fix it @lucabonamini?

mergify[bot] avatar Mar 24 '22 00:03 mergify[bot]

I don't suppose you have a video of a few of the jobs running / starting positions (particularly the keepout or speed zones one)? That would help to put a graphic to the changes in starting pose / test zones

SteveMacenski avatar Mar 24 '22 22:03 SteveMacenski

I don't suppose you have a video of a few of the jobs running / starting positions (particularly the keepout or speed zones one)? That would help to put a graphic to the changes in starting pose / test zones

Just tell me which photos / videos you want, I'll do them asap 😃

lucabonamini avatar Mar 24 '22 22:03 lucabonamini

the keepout zone test and the default nav2_bringup launch file position would be good enough for me!

SteveMacenski avatar Mar 25 '22 01:03 SteveMacenski

This pull request is in conflict. Could you fix it @lucabonamini?

mergify[bot] avatar Apr 04 '22 21:04 mergify[bot]

There's a handful of linting errors this introduces if you look at CI's results. Also, a bunch of tests fail, did you add the awsd warehouse package to the package.xml files so that it would be installed?

SteveMacenski avatar Apr 11 '22 17:04 SteveMacenski

https://github.com/ros-planning/navigation2/pull/2797#discussion_r834775068 is the last bit!

I see you have some tests failing though, a few I can believe aren't your fault, but test_bt_navigator_2 / test_dynamic_obstacle are very unusal to be failing.

The others are flaky (except testIsPathValid which was recently fixed) but we should run a few time to make sure they eventualy pass and its the same flakiness that persists and not that they never pass with these changes. @AlexeyMerzlyakov can probably help with that since those are mostly his tests (test_keepout_filter / test_speed_filter_local)

Exciting!

SteveMacenski avatar Apr 18 '22 17:04 SteveMacenski

#2797 (comment) is the last bit!

As we discussed in PM, I managed to launch a simulation in Gazebo with multiple robots, but I'm still having some problems related to map loading.

I see you have some tests failing though, a few I can believe aren't your fault, but test_bt_navigator_2 / test_dynamic_obstacle are very unusal to be failing.

Both test_bt_navigator_2 and test_dynamic_obstacle probably failed because of [ERROR] [gzserver-1]: process has died [pid 2553, exit code 255, cmd 'gzserver /opt/ros/rolling/share/aws_robomaker_small_warehouse_world/worlds/no_roof_small_warehouse/no_roof_small_warehouse.world -s libgazebo_ros_init.so -s libgazebo_ros_factory.so -s libgazebo_ros_force_system.so ']. and [ERROR] [spawn_entity.py-3]: process has died [pid 2557, exit code 1, cmd '/opt/ros/rolling/lib/gazebo_ros/spawn_entity.py -entity turtlebot3_waffle -file /opt/overlay_ws/install/nav2_bringup/share/nav2_bringup/worlds/waffle.model -robot_namespace -x 1.80 -y 2.20 -z 0.01 -R 0.00 -P 0.00 -Y 0.00 --ros-args'].

lucabonamini avatar Apr 18 '22 20:04 lucabonamini

I cannot understand why only those tests are failing. If something related to Gazebo is set incorrectly, ALL tests should failed, isn't it?

Maybe launch Gazebo with verbose option could give more informations?

lucabonamini avatar Apr 18 '22 21:04 lucabonamini

It could also just be flaky from a single bad run. Does that happen reliably?

I haven't seen gazebo crashing as the reason for failures recently in Nav2 CI, it used to happen alot, but hasn't in the most recent history

SteveMacenski avatar Apr 18 '22 21:04 SteveMacenski

Now I see the same problem for keepout test concerned with Gazebo server crashing at CI side:

[ERROR] [gzserver-1]: process has died [pid 3854, exit code 255, cmd 'gzserver /opt/ros/rolling/share/aws_robomaker_small_warehouse_world/worlds/no_roof_small_warehouse/no_roof_small_warehouse.world                                                                      -s libgazebo_ros_init.so   -s libgazebo_ros_factory.so   -s libgazebo_ros_force_system.so       '].

As soon as I will complete local re-build of @lucabonamini workspace, I'll give more information about local keepout/speed filter tests status.

AlexeyMerzlyakov avatar Apr 19 '22 12:04 AlexeyMerzlyakov

Update information:

I've run test_keepout_filter in a built workspace with AWS under same Docker container (as used with CI) locally. Test works fine, so this is the same as for other tests Gazebo problem.

For test_speed_filter at the end of move robot enters into lost state causing back-up spins (sometimes entering again speed limit zone, sometimes not reaching its final goal), which causing test fails. After rviz2/gzclient visualization will be added, I could see and explain what is going on there.

AlexeyMerzlyakov avatar Apr 20 '22 11:04 AlexeyMerzlyakov

For test_speed_filter at the end of move robot enters into lost state causing back-up spins (sometimes entering again speed limit zone, sometimes not reaching its final goal), which causing test fails. After rviz2/gzclient visualization will be added, I could see and explain what is going on there.

I'll investigate further, thank you for your feedback.

lucabonamini avatar Apr 20 '22 12:04 lucabonamini

I've made rviz visualization of test. From the rviz, looks like robot have to received incorrect initial pose position and rotation in the test. This picture was taken after receiving initialpose message: Screenshot_2022-04-20_20-22-04_r

Please check the test and tester_node.py->fwd_pose() routine whether my guess correct or not.

AlexeyMerzlyakov avatar Apr 20 '22 17:04 AlexeyMerzlyakov

I tested locally test_speed_filter_global and test_speed_filter_local.

Screenshot from 2022-04-20 22-30-04

Screenshot from 2022-04-20 22-30-05

Screenshot from 2022-04-20 22-30-13

Screenshot from 2022-04-20 22-30-42

Screenshot from 2022-04-20 22-30-45

Everything works fine when I run each test separately. But when I run them together, it happens that test_speed_filter_local starts where test_speed_filter_global ended, so it's as the first gzserver were still alive.

In this photo you can see the second gzclient launched by the second test. Screenshot from 2022-04-20 22-31-11

Here you can see the wrong initial pose for the second test and the path that's calculated. Screenshot from 2022-04-20 22-31-15

Screenshot from 2022-04-20 22-31-20

lucabonamini avatar Apr 20 '22 20:04 lucabonamini

@lucabonamini thank you for the analysis! Confirming this statement: gzserver is not being killed after each testcase run, so next testcase running with it will cause incorrect robot spawning position (but not initalpose as I initially thought).

The logs from experiments I've repeated today confirming this:

root@f793eb285ea4:/opt/overlay_ws# /usr/bin/python3.8 "-u" "/opt/ros/rolling/share/ament_cmake_test/cmake/run_test.py" "/opt/overlay_ws/build/nav2_system_tests/test_results/nav2_system_tests/test_speed_filter.xml" "--package-name" "nav2_system_tests" "--generate-result-on-success" "--env" "TEST_DIR=/opt/overlay_ws/src/navigation2/nav2_system_tests/src/costmap_filters" "TEST_MASK=/opt/overlay_ws/src/navigation2/nav2_system_tests/maps/speed_mask.yaml" "BT_NAVIGATOR_XML=navigate_to_pose_w_replanning_and_recovery.xml" "ASTAR=False" "PARAMS_FILE=/opt/overlay_ws/src/navigation2/nav2_system_tests/src/costmap_filters/speed_local_params.yaml" "--command" "/opt/overlay_ws/src/navigation2/nav2_system_tests/src/costmap_filters/test_speed_launch.py"
...
[tester_node-9] [INFO] [1650520047.560853668] [nav2_tester]: Received amcl_pose
[component_container_isolated-8] [WARN] [1650520047.715743025] [controller_server]: Control loop missed its desired rate of 20.0000Hz
[component_container_isolated-8] [WARN] [1650520047.761236446] [controller_server]: Control loop missed its desired rate of 20.0000Hz
[component_container_isolated-8] [INFO] [1650520048.400217805] [controller_server]: Passing new path to controller.
[tester_node-9] [INFO] [1650520048.645704016] [nav2_tester]: Received amcl_pose
[component_container_isolated-8] [INFO] [1650520048.774803535] [controller_server]: Reached the goal!
[component_container_isolated-8] [INFO] [1650520048.813128449] [bt_navigator]: Goal succeeded
[tester_node-9] [INFO] [1650520048.815649460] [nav2_tester]: Goal succeeded!
[tester_node-9] [INFO] [1650520048.818689353] [nav2_tester]: Setting goal pose
[component_container_isolated-8] [INFO] [1650520048.821691575] [bt_navigator]: Begin navigating from current location to (1.00, 5.00)
[tester_node-9] [INFO] [1650520048.821955403] [nav2_tester]: Waiting 60 seconds for robot to reach goal
[tester_node-9] [INFO] [1650520048.823897591] [nav2_tester]: Distance from goal is: 0.17466373924126408
[tester_node-9] [INFO] [1650520048.824910483] [nav2_tester]: *** GOAL REACHED ***
[tester_node-9] [INFO] [1650520048.825927846] [nav2_tester]: Test PASSED
[tester_node-9] [INFO] [1650520048.826896443] [nav2_tester]: Shutting down
[tester_node-9] [INFO] [1650520048.843981552] [nav2_tester]: Shutting down navigation lifecycle manager...
[component_container_isolated-8] [INFO] [1650520048.844373739] [controller_server]: Received a goal, begin computing control effort.
[component_container_isolated-8] [INFO] [1650520048.854760574] [lifecycle_manager_navigation]: Terminating bond timer...
[component_container_isolated-8] [INFO] [1650520048.854895526] [lifecycle_manager_navigation]: Shutting down managed nodes...
[component_container_isolated-8] [INFO] [1650520048.854947183] [lifecycle_manager_navigation]: Deactivate, cleanup, and shutdown nodes
[component_container_isolated-8] [INFO] [1650520048.854985330] [lifecycle_manager_navigation]: Deactivating waypoint_follower
[component_container_isolated-8] [INFO] [1650520048.855964881] [waypoint_follower]: Deactivating
[component_container_isolated-8] [INFO] [1650520048.856044833] [waypoint_follower]: Destroying bond (waypoint_follower) to lifecycle manager.
[component_container_isolated-8] [INFO] [1650520048.869964637] [controller_server]: Reached the goal!
[component_container_isolated-8] [INFO] [1650520048.902224250] [bt_navigator]: Goal succeeded
[component_container_isolated-8] [INFO] [1650520049.061393386] [lifecycle_manager_navigation]: Deactivating bt_navigator
[component_container_isolated-8] [INFO] [1650520049.062017990] [bt_navigator]: Deactivating
[component_container_isolated-8] [INFO] [1650520049.062150756] [bt_navigator]: Destroying bond (bt_navigator) to lifecycle manager.
[component_container_isolated-8] [INFO] [1650520049.261678355] [lifecycle_manager_navigation]: Deactivating behavior_server
[component_container_isolated-8] [INFO] [1650520049.264362256] [behavior_server]: Deactivating
[component_container_isolated-8] [INFO] [1650520049.264853595] [behavior_server]: Destroying bond (behavior_server) to lifecycle manager.
[component_container_isolated-8] [INFO] [1650520049.412835551] [lifecycle_manager_navigation]: Deactivating planner_server
[component_container_isolated-8] [INFO] [1650520049.413403293] [planner_server]: Deactivating
[component_container_isolated-8] [INFO] [1650520049.413541299] [global_costmap.global_costmap]: Deactivating
[component_container_isolated-8] [INFO] [1650520049.529044946] [planner_server]: Deactivating plugin GridBased of type NavfnPlanner
[component_container_isolated-8] [INFO] [1650520049.529205529] [planner_server]: Destroying bond (planner_server) to lifecycle manager.
[component_container_isolated-8] [INFO] [1650520049.636399355] [lifecycle_manager_navigation]: Deactivating smoother_server
[component_container_isolated-8] [INFO] [1650520049.636943523] [smoother_server]: Deactivating
[component_container_isolated-8] [INFO] [1650520049.637074095] [smoother_server]: Destroying bond (smoother_server) to lifecycle manager.
[component_container_isolated-8] [INFO] [1650520049.828291409] [lifecycle_manager_navigation]: Deactivating controller_server
[component_container_isolated-8] [INFO] [1650520049.830474539] [controller_server]: Deactivating
[component_container_isolated-8] [INFO] [1650520049.830609979] [local_costmap.local_costmap]: Deactivating
[component_container_isolated-8] [INFO] [1650520049.972695891] [controller_server]: Destroying bond (controller_server) to lifecycle manager.
[component_container_isolated-8] [INFO] [1650520050.083762370] [lifecycle_manager_navigation]: Cleaning up waypoint_follower
[component_container_isolated-8] [INFO] [1650520050.084396192] [waypoint_follower]: Cleaning up
[component_container_isolated-8] [INFO] [1650520050.117330799] [lifecycle_manager_navigation]: Cleaning up bt_navigator
[component_container_isolated-8] [INFO] [1650520050.118196195] [bt_navigator]: Cleaning up
[component_container_isolated-8] [INFO] [1650520050.276028751] [bt_navigator]: Completed Cleaning up
[component_container_isolated-8] [INFO] [1650520050.277630095] [lifecycle_manager_navigation]: Cleaning up behavior_server
[component_container_isolated-8] [INFO] [1650520050.278246869] [behavior_server]: Cleaning up
[component_container_isolated-8] [INFO] [1650520050.307634168] [lifecycle_manager_navigation]: Cleaning up planner_server
[component_container_isolated-8] [INFO] [1650520050.307863166] [planner_server]: Cleaning up
[component_container_isolated-8] [INFO] [1650520050.322032472] [global_costmap.global_costmap]: Cleaning up
[component_container_isolated-8] [INFO] [1650520050.327054671] [planner_server]: Cleaning up plugin GridBased of type NavfnPlanner
[component_container_isolated-8] [INFO] [1650520050.327178184] [planner_server]: Destroying plugin GridBased of type NavfnPlanner
[component_container_isolated-8] [INFO] [1650520050.327665859] [lifecycle_manager_navigation]: Cleaning up smoother_server
[component_container_isolated-8] [INFO] [1650520050.327880774] [smoother_server]: Cleaning up
[component_container_isolated-8] [INFO] [1650520050.344651406] [lifecycle_manager_navigation]: Cleaning up controller_server
[component_container_isolated-8] [INFO] [1650520050.344889334] [controller_server]: Cleaning up
[component_container_isolated-8] [INFO] [1650520050.351546806] [local_costmap.local_costmap]: Cleaning up
[component_container_isolated-8] [INFO] [1650520050.366773836] [lifecycle_manager_navigation]: Shutting down waypoint_follower
[component_container_isolated-8] [INFO] [1650520050.366986257] [waypoint_follower]: Shutting down
[component_container_isolated-8] [INFO] [1650520050.367226751] [lifecycle_manager_navigation]: Shutting down bt_navigator
[component_container_isolated-8] [INFO] [1650520050.367377853] [bt_navigator]: Shutting down
[component_container_isolated-8] [INFO] [1650520050.368116556] [lifecycle_manager_navigation]: Shutting down behavior_server
[component_container_isolated-8] [INFO] [1650520050.369218366] [behavior_server]: Shutting down
[component_container_isolated-8] [INFO] [1650520050.373155936] [lifecycle_manager_navigation]: Shutting down planner_server
[component_container_isolated-8] [INFO] [1650520050.373390795] [planner_server]: Shutting down
[component_container_isolated-8] [INFO] [1650520050.373678874] [lifecycle_manager_navigation]: Shutting down smoother_server
[component_container_isolated-8] [INFO] [1650520050.376956198] [smoother_server]: Shutting down
[component_container_isolated-8] [INFO] [1650520050.377258262] [lifecycle_manager_navigation]: Shutting down controller_server
[component_container_isolated-8] [INFO] [1650520050.377540960] [controller_server]: Shutting down
[component_container_isolated-8] [INFO] [1650520050.378228946] [lifecycle_manager_navigation]: Destroying lifecycle service clients
[component_container_isolated-8] [INFO] [1650520050.398798582] [lifecycle_manager_navigation]: Managed nodes have been shut down
[tester_node-9] [INFO] [1650520050.399922155] [nav2_tester]: Shutting down navigation lifecycle manager complete.
[tester_node-9] [INFO] [1650520050.401136111] [nav2_tester]: Shutting down localization lifecycle manager...
[component_container_isolated-8] [INFO] [1650520050.415876398] [lifecycle_manager_localization]: Terminating bond timer...
[component_container_isolated-8] [INFO] [1650520050.415971376] [lifecycle_manager_localization]: Shutting down managed nodes...
[component_container_isolated-8] [INFO] [1650520050.416000856] [lifecycle_manager_localization]: Deactivate, cleanup, and shutdown nodes
[component_container_isolated-8] [INFO] [1650520050.416025084] [lifecycle_manager_localization]: Deactivating amcl
[component_container_isolated-8] [INFO] [1650520050.416307709] [amcl]: Deactivating
[component_container_isolated-8] [INFO] [1650520050.416355985] [amcl]: Destroying bond (amcl) to lifecycle manager.
[component_container_isolated-8] [INFO] [1650520050.560362434] [lifecycle_manager_localization]: Deactivating map_server
[component_container_isolated-8] [INFO] [1650520050.561127638] [map_server]: Deactivating
[component_container_isolated-8] [INFO] [1650520050.561248411] [map_server]: Destroying bond (map_server) to lifecycle manager.
[component_container_isolated-8] [INFO] [1650520050.757580273] [lifecycle_manager_localization]: Cleaning up amcl
[component_container_isolated-8] [INFO] [1650520050.758211139] [amcl]: Cleaning up
[component_container_isolated-8] [INFO] [1650520050.792389263] [lifecycle_manager_localization]: Cleaning up map_server
[component_container_isolated-8] [INFO] [1650520050.795005633] [map_server]: Cleaning up
[component_container_isolated-8] [INFO] [1650520050.801468622] [lifecycle_manager_localization]: Shutting down amcl
[component_container_isolated-8] [INFO] [1650520050.803662400] [amcl]: Shutting down
[component_container_isolated-8] [INFO] [1650520050.812645973] [lifecycle_manager_localization]: Shutting down map_server
[component_container_isolated-8] [INFO] [1650520050.813418938] [map_server]: Shutting down
[component_container_isolated-8] [INFO] [1650520050.814581914] [lifecycle_manager_localization]: Destroying lifecycle service clients
[component_container_isolated-8] [INFO] [1650520050.838271976] [lifecycle_manager_localization]: Managed nodes have been shut down
[tester_node-9] [INFO] [1650520050.841093143] [nav2_tester]: Shutting down localization lifecycle manager complete
[tester_node-9] [INFO] [1650520050.842349161] [nav2_tester]: Done Shutting Down.
[tester_node-9] [INFO] [1650520050.843537529] [nav2_tester]: Exiting passed
[tester_node-9] /opt/ros/rolling/lib/python3.8/site-packages/rclpy/qos.py:307: UserWarning: DurabilityPolicy.RMW_QOS_POLICY_DURABILITY_VOLATILE is deprecated. Use DurabilityPolicy.VOLATILE instead.
[tester_node-9]   warnings.warn(
[tester_node-9] /opt/ros/rolling/lib/python3.8/site-packages/rclpy/qos.py:307: UserWarning: HistoryPolicy.RMW_QOS_POLICY_HISTORY_KEEP_LAST is deprecated. Use HistoryPolicy.KEEP_LAST instead.
[tester_node-9]   warnings.warn(
[INFO] [tester_node-9]: process has finished cleanly [pid 499]
[INFO] [component_container_isolated-8]: sending signal 'SIGINT' to process[component_container_isolated-8]
[INFO] [costmap_filter_info_server-7]: sending signal 'SIGINT' to process[costmap_filter_info_server-7]
[INFO] [map_server-6]: sending signal 'SIGINT' to process[map_server-6]
[INFO] [lifecycle_manager-5]: sending signal 'SIGINT' to process[lifecycle_manager-5]
[INFO] [static_transform_publisher-4]: sending signal 'SIGINT' to process[static_transform_publisher-4]
[INFO] [static_transform_publisher-3]: sending signal 'SIGINT' to process[static_transform_publisher-3]
[INFO] [lifecycle_manager-5]: process has finished cleanly [pid 452]
[INFO] [gzserver-1]: sending signal 'SIGINT' to process[gzserver-1]
[component_container_isolated-8] [INFO] [1650520051.333237086] [rclcpp]: signal_handler(signum=2)
[costmap_filter_info_server-7] [INFO] [1650520051.336261311] [rclcpp]: signal_handler(signum=2)
[component_container_isolated-8] [INFO] [1650520051.337165042] [lifecycle_manager_navigation]: Destroying lifecycle_manager_navigation
[map_server-6] [INFO] [1650520051.338289466] [rclcpp]: signal_handler(signum=2)
[lifecycle_manager-5] [INFO] [1650520051.341796116] [rclcpp]: signal_handler(signum=2)
[static_transform_publisher-4] [INFO] [1650520051.344755863] [rclcpp]: signal_handler(signum=2)
[map_server-6] [INFO] [1650520051.347228689] [filter_mask_server]: Destroying
[static_transform_publisher-3] [INFO] [1650520051.347963376] [rclcpp]: signal_handler(signum=2)
[costmap_filter_info_server-7] [INFO] [1650520051.357723510] [costmap_filter_info_server]: Destroying
[component_container_isolated-8] [INFO] [1650520051.377148164] [waypoint_follower]: Destroying
[component_container_isolated-8] [INFO] [1650520051.403726773] [bt_navigator]: Destroying
[component_container_isolated-8] [INFO] [1650520051.433630026] [behavior_server]: Destroying
[component_container_isolated-8] [INFO] [1650520051.458216129] [lifecycle_manager_localization]: Destroying lifecycle_manager_localization
[component_container_isolated-8] [INFO] [1650520051.488941574] [global_costmap.global_costmap]: Destroying
[component_container_isolated-8] [INFO] [1650520051.530412195] [planner_server]: Destroying
[component_container_isolated-8] [INFO] [1650520051.557384380] [amcl]: Destroying
[component_container_isolated-8] [INFO] [1650520051.602778903] [smoother_server]: Destroying
[component_container_isolated-8] [INFO] [1650520051.649151806] [map_server]: Destroying
[component_container_isolated-8] [INFO] [1650520051.700653002] [local_costmap.local_costmap]: Destroying
[component_container_isolated-8] [INFO] [1650520051.764292914] [controller_server]: Destroying
[INFO] [static_transform_publisher-4]: process has finished cleanly [pid 450]
[INFO] [static_transform_publisher-3]: process has finished cleanly [pid 447]
[INFO] [map_server-6]: process has finished cleanly [pid 472]
[INFO] [costmap_filter_info_server-7]: process has finished cleanly [pid 474]
[INFO] [component_container_isolated-8]: process has finished cleanly [pid 497]
[ERROR] [gzserver-1]: process[gzserver-1] failed to terminate '5' seconds after receiving 'SIGINT', escalating to 'SIGTERM'
[INFO] [gzserver-1]: sending signal 'SIGTERM' to process[gzserver-1]
[ERROR] [gzserver-1]: process has died [pid 443, exit code -15, cmd 'gzserver /opt/aws_ws/install/aws_robomaker_small_warehouse_world/share/aws_robomaker_small_warehouse_world/worlds/no_roof_small_warehouse/no_roof_small_warehouse.world                                                                      -s libgazebo_ros_init.so   -s libgazebo_ros_factory.so   -s libgazebo_ros_force_system.so       '].
-- run_test.py: return code 0
-- run_test.py: generate result file '/opt/overlay_ws/build/nav2_system_tests/test_results/nav2_system_tests/test_speed_filter.xml' with successful test
root@f793eb285ea4:/opt/overlay_ws# ps -afx | grep gz
    671 pts/1    S+     0:00  \_ grep --color=auto gz
    449 pts/1    SLl    1:48 gzserver /opt/aws_ws/install/aws_robomaker_small_warehouse_world/share/aws_robomaker_small_warehouse_world/worlds/no_roof_small_warehouse/no_roof_small_warehouse.world -s libgazebo_ros_init.so -s libgazebo_ros_factory.so -s libgazebo_ros_force_system.so

After re-running test, I've got similar as presented above pictures of incorrect robot position.

So, after SIGINT and SIGTERM gzserver is still alive, that is the reason of all problems I think

AlexeyMerzlyakov avatar Apr 21 '22 06:04 AlexeyMerzlyakov

The problem is appeared because there are launched two gzserver processes:

root@f793eb285ea4:/opt/overlay_ws# ps -afx | grep gz
   2009 pts/1    S+     0:00              \_ /bin/sh -c gzserver /opt/aws_ws/install/aws_robomaker_small_warehouse_world/share/aws_robomaker_small_warehouse_world/worlds/no_roof_small_warehouse/no_roof_small_warehouse.world                                                                      -s libgazebo_ros_init.so   -s libgazebo_ros_factory.so   -s libgazebo_ros_force_system.so       
   2015 pts/1    SLl+   0:12              |   \_ gzserver /opt/aws_ws/install/aws_robomaker_small_warehouse_world/share/aws_robomaker_small_warehouse_world/worlds/no_roof_small_warehouse/no_roof_small_warehouse.world -s libgazebo_ros_init.so -s libgazebo_ros_factory.so -s libgazebo_ros_force_system.so

First shell-process is being killed by python test launching scripts, child process is remaining to live after test execution.

The problem disappears if we will roll-back from using the external gzserver.launch.py gzserver launch script -> to direct execution of process as it currently made in mainline:

             description='name of the robot'),
 
         # Launch gazebo server for simulation
-        IncludeLaunchDescription(
-            PythonLaunchDescriptionSource(
-                os.path.join(gazebo_ros, 'launch', 'gzserver.launch.py')),
-            launch_arguments={'world': world}.items()
-        ),
+        ExecuteProcess(
+            cmd=['gzserver', '-s', 'libgazebo_ros_init.so', '-s', 'libgazebo_ros_factory.so',
+                 '--minimal_comms', world],
+            cwd=[aws_dir], output='screen'),
 
         Node(
             package='gazebo_ros',

Please note, that AWS small warehouse requires adding one new library: libgazebo_ros_factory.so, without which as I remember Gazebo won't work correctly. Another item I've found, that without correct cwd, Gazebo will generate empty world when running test manually. It is strange, how these tests were executed previously w/o it.

One more note in favor of direct executing of gzserver and gzclient processes instead of using external gzserver.launch.py/gzclient.launch.py scripts: I've run into the following problem on my local PC when using these scripts (for both gzsever and gzclient):

[gzclient-2] gzclient: /usr/include/boost/smart_ptr/shared_ptr.hpp:734: typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = gazebo::rendering::Camera; typename boost::detail::sp_member_access<T>::type = gazebo::rendering::Camera*]: Assertion `px != 0' failed.
[gzclient-2] Aborted (core dumped)
[ERROR] [gzclient-2]: process has died [pid 261449, exit code 134, cmd 'gzclient'].

This problem won't appear on Docker CI, but anyway it might be a small plus-argument if favor of not relying on external launch-scripts.

AlexeyMerzlyakov avatar Apr 21 '22 07:04 AlexeyMerzlyakov

The problem is appeared because there are launched two gzserver processes:

root@f793eb285ea4:/opt/overlay_ws# ps -afx | grep gz
   2009 pts/1    S+     0:00              \_ /bin/sh -c gzserver /opt/aws_ws/install/aws_robomaker_small_warehouse_world/share/aws_robomaker_small_warehouse_world/worlds/no_roof_small_warehouse/no_roof_small_warehouse.world                                                                      -s libgazebo_ros_init.so   -s libgazebo_ros_factory.so   -s libgazebo_ros_force_system.so       
   2015 pts/1    SLl+   0:12              |   \_ gzserver /opt/aws_ws/install/aws_robomaker_small_warehouse_world/share/aws_robomaker_small_warehouse_world/worlds/no_roof_small_warehouse/no_roof_small_warehouse.world -s libgazebo_ros_init.so -s libgazebo_ros_factory.so -s libgazebo_ros_force_system.so

First shell-process is being killed by python test launching scripts, child process is remaining to live after test execution.

The problem disappears if we will roll-back from using the external gzserver.launch.py gzserver launch script -> to direct execution of process as it currently made in mainline:

             description='name of the robot'),
 
         # Launch gazebo server for simulation
-        IncludeLaunchDescription(
-            PythonLaunchDescriptionSource(
-                os.path.join(gazebo_ros, 'launch', 'gzserver.launch.py')),
-            launch_arguments={'world': world}.items()
-        ),
+        ExecuteProcess(
+            cmd=['gzserver', '-s', 'libgazebo_ros_init.so', '-s', 'libgazebo_ros_factory.so',
+                 '--minimal_comms', world],
+            cwd=[aws_dir], output='screen'),
 
         Node(
             package='gazebo_ros',

Please note, that AWS small warehouse requires adding one new library: libgazebo_ros_factory.so, without which as I remember Gazebo won't work correctly. Another item I've found, that without correct cwd, Gazebo will generate empty world when running test manually. It is strange, how these tests were executed previously w/o it.

Thank you @AlexeyMerzlyakov ! I switched to external launch scripts since there was a problem (PR) with AWS smal warehouse world.

One more note in favor of direct executing of gzserver and gzclient processes instead of using external gzserver.launch.py/gzclient.launch.py scripts: I've run into the following problem on my local PC when using these scripts (for both gzsever and gzclient):

[gzclient-2] gzclient: /usr/include/boost/smart_ptr/shared_ptr.hpp:734: typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = gazebo::rendering::Camera; typename boost::detail::sp_member_access<T>::type = gazebo::rendering::Camera*]: Assertion `px != 0' failed.
[gzclient-2] Aborted (core dumped)
[ERROR] [gzclient-2]: process has died [pid 261449, exit code 134, cmd 'gzclient'].

This problem won't appear on Docker CI, but anyway it might be a small plus-argument if favor of not relying on external launch-scripts.

Yes, in order to have gzclient successfully run, you have to source /usr/share/gazebo/setup.bash (as said in the above linked PR).

I'll try to direct execute gzserver and gzclient process. Honestly, I haven't tried again after the world binaries were released.

lucabonamini avatar Apr 21 '22 08:04 lucabonamini

And two more notes, not related to Gazebo:

  1. When running keepout system test, robot in the end of its trajectory passes close to keepout zone (at the "gates" between two boxes) and in some cases might occasionally go inside it (e.g. when localization accuracy degraded for some reasons). This test is designed to check the keepout zones are working (robot do not pass through them), but not for localization accuracy. So, I suggest to move keepout zone a little to the right as presented on picture below: Screenshot_2022-04-21_11-11-52_r2

  2. In nav2_system_tests/src/planning/planner_tester.cpp file the following change is OS-dependent:

@@ -162,14 +164,13 @@ void PlannerTester::loadDefaultMap()
 
   nav2_map_server::MapMode mode = nav2_map_server::MapMode::Trinary;
 
-  std::string file_path = "";
-  char const * path = getenv("TEST_MAP");
-  if (path == NULL) {
+  std::string aws_dir = ament_index_cpp::get_package_share_directory(
+    "aws_robomaker_small_warehouse_world");
+  std::string file_path = aws_dir + "/maps/005/map_rotated.png";
+  if (file_path.empty()) {
     throw std::runtime_error(
             "Path to map image file"

Forward slashes "/" won't work on Windows hosts, so it is better to use std::experimental::filesystem::path or boost::filesystem for correct merging paths and files.

AlexeyMerzlyakov avatar Apr 21 '22 08:04 AlexeyMerzlyakov

And two more notes, not related to Gazebo:

1. When running keepout system test, robot in the end of its trajectory passes close to keepout zone (at the "gates" between two boxes) and in some cases might occasionally go inside it (e.g. when localization accuracy degraded for some reasons). This test is designed to check the keepout zones are working (robot do not pass through them), but not for localization accuracy. So, I suggest to move keepout zone a little to the right as presented on picture below:
   ![Screenshot_2022-04-21_11-11-52_r2](https://user-images.githubusercontent.com/60094858/164416357-9408cf6f-e275-46b6-933a-2aac4eb86f1e.png)

Thank you, I'll fix the keepout zone asap!

2. In `nav2_system_tests/src/planning/planner_tester.cpp` file the following change is OS-dependent:
@@ -162,14 +164,13 @@ void PlannerTester::loadDefaultMap()
 
   nav2_map_server::MapMode mode = nav2_map_server::MapMode::Trinary;
 
-  std::string file_path = "";
-  char const * path = getenv("TEST_MAP");
-  if (path == NULL) {
+  std::string aws_dir = ament_index_cpp::get_package_share_directory(
+    "aws_robomaker_small_warehouse_world");
+  std::string file_path = aws_dir + "/maps/005/map_rotated.png";
+  if (file_path.empty()) {
     throw std::runtime_error(
             "Path to map image file"

Forward slashes "/" won't work on Windows hosts, so it is better to use std::experimental::filesystem::path or boost::filesystem for correct merging paths and files.

My fault, I didn't think about Windows users. I'll also fix this asap!

lucabonamini avatar Apr 21 '22 09:04 lucabonamini

I switched to external launch scripts since there was a problem (https://github.com/aws-robotics/aws-robomaker-small-warehouse-world/pull/20) with AWS smal warehouse world.

That is interesting information. This means, that it need to find_package(aws_robomaker_small_warehouse_world) in nav2_system_tests CMakeFiles.txt and then add it to the GAZEBO_MODEL_PATH there; or still to deal with two processes in gzserver.launch.py launch scripts.

BTW, if aws_robomaker_small_warehouse_world will be found in cmake dependencies, I guess there is no need to make it again in nav2_system_tests/src/planning/planner_tester.cpp. It could be rolled-back to usage of TEST_MAP variable generated by CMakeLists.txt which will know about AWS dependency.

Yes, in order to have gzclient successfully run, you have to source /usr/share/gazebo/setup.bash (as said in the above linked PR).

It works like a charm for me. Thank you for nothing this!

AlexeyMerzlyakov avatar Apr 21 '22 09:04 AlexeyMerzlyakov

Thank you very much @AlexeyMerzlyakov for jumping in and helping solve this, great to see so much progress while I've been gone! :1st_place_medal: It sounds like you guys have this debugging under control!

SteveMacenski avatar Apr 26 '22 23:04 SteveMacenski

And two more notes, not related to Gazebo:

1. When running keepout system test, robot in the end of its trajectory passes close to keepout zone (at the "gates" between two boxes) and in some cases might occasionally go inside it (e.g. when localization accuracy degraded for some reasons). This test is designed to check the keepout zones are working (robot do not pass through them), but not for localization accuracy. So, I suggest to move keepout zone a little to the right as presented on picture below:
   ![Screenshot_2022-04-21_11-11-52_r2](https://user-images.githubusercontent.com/60094858/164416357-9408cf6f-e275-46b6-933a-2aac4eb86f1e.png)

Thank you, I'll fix the keepout zone asap!

2. In `nav2_system_tests/src/planning/planner_tester.cpp` file the following change is OS-dependent:
@@ -162,14 +164,13 @@ void PlannerTester::loadDefaultMap()
 
   nav2_map_server::MapMode mode = nav2_map_server::MapMode::Trinary;
 
-  std::string file_path = "";
-  char const * path = getenv("TEST_MAP");
-  if (path == NULL) {
+  std::string aws_dir = ament_index_cpp::get_package_share_directory(
+    "aws_robomaker_small_warehouse_world");
+  std::string file_path = aws_dir + "/maps/005/map_rotated.png";
+  if (file_path.empty()) {
     throw std::runtime_error(
             "Path to map image file"

Forward slashes "/" won't work on Windows hosts, so it is better to use std::experimental::filesystem::path or boost::filesystem for correct merging paths and files.

My fault, I didn't think about Windows users. I'll also fix this asap!

@AlexeyMerzlyakov Done!

lucabonamini avatar Apr 27 '22 20:04 lucabonamini