fetch_gazebo icon indicating copy to clipboard operation
fetch_gazebo copied to clipboard

Run demo.launch on melodic

Open 708yamaguchi opened this issue 4 years ago • 7 comments

In this pull request, I made 3 changes to run demo.launch on melodic. (I followed this tutorial)

  1. Correct python runtime error.
  2. Change inflation parameters for fetch navigation.
  3. Fix fetch's amcl start pose in demo

Detailed descriptions are below. I am sorry for the long comment.

1. Correct python runtime error

Before moveit_python version 0.2.17, addSolidPrimitive function has wait key argument. However, After version 0.3.0, wait key argument is replaced by use_service key argument. This change causes runtime error on demo.py. 0.2.17 AddSolidPrimitive: https://github.com/mikeferguson/moveit_python/blob/0.2.17/src/moveit_python/planning_scene_interface.py#L207 0.3.0 AddSolidPrimitive: https://github.com/mikeferguson/moveit_python/blob/0.3.0/src/moveit_python/planning_scene_interface.py#L219

Now, 0.3.x is released on only kinetic or higher (not on indigo). I think wait and use_service is a similar argument, so I change how to pass argument (wait = False -> False) so that we can use the same argument on indigo, kinetic and melodic. If you have any other good ideas, please teach me.

2. Change inflation parameters for fetch navigation

By default inflation of move_base, fetch cannot come close to table and go through narrow door in demo.py. By reducing the value of inflation, these problems are solved. (param name: /move_base/global(local)_costmap/inflater/inflation_radius)

To compare the effect of parameter change, I captured screenshot of Rviz: Problem in coming close to table: (Bad) Fig.1 (Good) Fig.2 Problem in going through narrow door: (Bad) Fig.3 (Good) Fig.4

I think inflation 0.3 is very small because fetch's inflation is also 0.3. Do you have any ideas to solve these problem?

Fig. 1 fetch cannot come close to the table with inflation 0.7 (default). In this case, fetch fails to grasp the blue box because IK is failed. inflation_0_7_table

Fig. 2 fetch can come close to the table with inflation 0.3 inflation_0_3_table

Fig. 3 fetch cannot go through the narrow door even with inflation 0.4 (default is 0.7) inflation_0_4_door

Fig. 4 fetch can go through the narrow door with inflation 0.3 inflation_0_3_door

3. Fix fetch's amcl start pose in demo

When I force-kill playground.launch and demo.launch, rosmaster is sometimes not killed and previous fetch's amcl pose is saved in the parameter server. In such cases, when I launch playground.launch and demo.launch again, fetch's initial amcl pose is the same as the previous pose. If fetch's initial amcl pose differs greatly from actual fetch pose, navigation is very likely to fail. Therefore, I fixed fetch's amcl start pose. (param name: /amcl/initial_pose_x, /amcl/initial_pose_y and /amcl/initial_pose_a)

I think this change is not needed if you assume rosmaster is always killed normally after using playground.launch and demo.launch.

708yamaguchi avatar Apr 01 '20 14:04 708yamaguchi

Hi @708yamaguchi , Thanks so much for the PR and excellent write-up!

I agree with your analysis in fix item 1. I requested a style change, but otherwise, nice and easy fix.

For item 2, I'm pinging a few others to look. I'm not sure what's up here, and I am probably not the one to figure it out. A question for you: Have you had a chance to compare what exactly is behaving differently between Indigo and Melodic?

I like fix item 3, as well. Thank you!

erelson avatar Apr 14 '20 00:04 erelson

Hi @erelson, thank you very much for your review!

I agree with your analysis in fix item 1. I requested a style change, but otherwise, nice and easy fix.

OK. I understand your idea. I pushed an additional commit. By the way, is use_service = True right? I think use_service should be False if we want to keep function of previous wait = False.

For item 2, I'm pinging a few others to look. I'm not sure what's up here, and I am probably not the one to figure it out. A question for you: Have you had a chance to compare what exactly is behaving differently between Indigo and Melodic?

I am also not sure about the situation, but I possibly found out the key commits in fetchrobotics/fetch_ros and ros-planning/navigation to cause this situation. If this is my misunderstanding, I am sorry.

When I use 5a6ad24cf1bbc8ac08c06ef258fdf0bbdb3f9789 commit of ros-planning/navigation and dd51b5f211251156262f39e48860558df5a99290 commit of fetchrobotics/fetch_ros without this pull request, fetch in gazebo is likely not to stuck in front of the table.

On the other hand, when I proceed 1 commit in each package, fetch in gazebo is likely to stuck in front of the table.

Therefore, I think the key commits are below.

  • 5fe08786d9058ea4dfe5b3292a6622efbb48a153 commit of ros-planning/navigation https://github.com/ros-planning/navigation/commit/5fe08786d9058ea4dfe5b3292a6622efbb48a153
  • 09db2ce01300b253b7fc7fd01ce258e20f2b9b41 commit of fetchrobotics/fetch_ros https://github.com/fetchrobotics/fetch_ros/commit/09db2ce01300b253b7fc7fd01ce258e20f2b9b41

However, unfortunately, I cannot understand the diff of these commits, because they are very big commits ...

Here is sample code to reproduce my situation. Please copy unstack.rosinstall or stack.rosinstall to ~/catkin_ws/src/.rosinstall

# Setup
sudo apt update
sudo apt install ros-melodic-fetch-gazebo-demo
mkdir -p ~/catkin_ws/src
cd catkin_ws/src
### Here, copy unstack.rosinstall or stuck.rosinstall (below) to ~/catkin_ws/src/.rosinstall ###
wstool update
rosdep install --from-paths . --ignore-src -y -r
cd ..
source /opt/ros/melodic/setup.bash
catkin build

# In terminal 1
source /opt/ros/melodic/setup.bash
source ~/catkin_ws/devel/setup.bash
roslaunch fetch_gazebo playground.launch

# In terminal 2
source /opt/ros/melodic/setup.bash
source ~/catkin_ws/devel/setup.bash
roslaunch fetch_gazebo_demo demo.launch

unstuck.rosinstall

- git:
    local-name: navigation
    uri: https://github.com/ros-planning/navigation.git
    version: 5a6ad24cf1bbc8ac08c06ef258fdf0bbdb3f9789 # 1 commit before 5fe08786d9058ea4dfe5b3292a6622efbb48a153
- git:
    local-name: fetch_ros
    uri: https://github.com/fetchrobotics/fetch_ros.git
    version: dd51b5f211251156262f39e48860558df5a99290 # 1 commit before 09db2ce01300b253b7fc7fd01ce258e20f2b9b41

stuck.rosinstall

- git:
    local-name: navigation
    uri: https://github.com/ros-planning/navigation.git
    version: 5fe08786d9058ea4dfe5b3292a6622efbb48a153 # 1 commit after 5a6ad24cf1bbc8ac08c06ef258fdf0bbdb3f9789
- git:
    local-name: fetch_ros
    uri: https://github.com/fetchrobotics/fetch_ros.git
    version: 09db2ce01300b253b7fc7fd01ce258e20f2b9b41 # 1 commit after dd51b5f211251156262f39e48860558df5a99290

I like fix item 3, as well. Thank you!

Thank you so much!

708yamaguchi avatar Apr 14 '20 17:04 708yamaguchi

By the way, is use_service = True right? I think use_service should be False if we want to keep function of previous wait = False.

These aren't exactly one-to-one. The service is just that, a ROS service call - we are sure that our change took effect. In the old system, there was no service available, and so we would have to send a change, and then monitor the feedback. To speed things up, you could send several requests with "wait=False", and then send only the last one with wait=True, and our internal bookkeeping would make sure all the messages got through. I'd recommend that you do use_service=True to make sure your updates all get through (also, the service call is still far faster than the old "wait for feedback to catch up" routine).

However, unfortunately, I cannot understand the diff of these commits, because they are very big commits ...

So, have you always been building from source? If so, I'd recommend setting CMAKE_BUILD_TYPE=Release. The change to TF2, and associated use of tf2_sensor_msgs::PointCloudIterator is very sensitive to compilation being Release (it's about 300x faster in Release mode than Debug). I've found several times that issues with timing go away when switching to Release build.

mikeferguson avatar Apr 14 '20 18:04 mikeferguson

@mikeferguson

These aren't exactly one-to-one. The service is just that, a ROS service call - we are sure that our change took effect. In the old system, there was no service available, and so we would have to send a change, and then monitor the feedback. To speed things up, you could send several requests with "wait=False", and then send only the last one with wait=True, and our internal bookkeeping would make sure all the messages got through. I'd recommend that you do use_service=True to make sure your updates all get through (also, the service call is still far faster than the old "wait for feedback to catch up" routine).

Thank you very much for your information. I understand the argument meaning and your suggestion. I use use_service=True for secure and fast implementation of requests.

So, have you always been building from source? If so, I'd recommend setting CMAKE_BUILD_TYPE=Release. The change to TF2, and associated use of tf2_sensor_msgs::PointCloudIterator is very sensitive to compilation being Release (it's about 300x faster in Release mode than Debug). I've found several times that issues with timing go away when switching to Release build.

Also, thank you very much for your advice. I realized CMAKE_BUILD_TYPE=Release flag for the first time. I will use this flag from now on.

Thank you again!

708yamaguchi avatar Apr 14 '20 19:04 708yamaguchi

@erelson & @mikeferguson I took a very brief look into DEBUG v RELEASE and didn’t see the robot get through the door.

This costmap change only effects sim, so if it makes it get through the door it’s an improvement but still the problem is likely in the fetch_depth_layer tf->tf2 conversion

moriarty avatar Apr 26 '20 00:04 moriarty

This costmap change only effects sim, so if it makes it get through the door it’s an improvement but still the problem is likely in the fetch_depth_layer tf->tf2 conversion

So, as a test, if you simply remove the fetch depth layer from the navigation config, do things work fine? #36 is showing that localization in simulation is crap - in which case both the laser and depth data should be incorrect in the costmap. If only one is incorrect, then yes the depth layer needs to be investigated

mikeferguson avatar Apr 28 '20 14:04 mikeferguson

So I tried out my hypothesis with disabling the fetch_depth_layer - doesn't seem to fix things. Applying the much smaller inflation also still lets the robot get caught in the door sometimes. I see two problems:

  1. Localization isn't good. Bumping odom_alpha1 (rotation per rotation) to 0.5 helps a lot - but there is still some sort of "just as I stop turning, localization shifts" going on.
  2. Even if localization is good, and the inflation radius is shrunk as per this PR, I'm seeing the robot get right up to the door, have a clear path and then decide to rotate in place instead (which borks the localization) on a number of runs. Which leads me to believe the local planner/controller is having issues. I've not found what set of parameters are wrong/changed yet.

I did find that our path/goal bias parameters aren't being taken into account - fixed that here: https://github.com/fetchrobotics/fetch_ros/pull/140 which makes the non-door nav look a little better.

mikeferguson avatar Apr 28 '20 15:04 mikeferguson