autoware.universe Vehicle makes collision when localization is bad.

Checklist

[X] I've read the contribution guidelines.
[X] I've searched other issues and no duplicate issues were found.
[X] I'm convinced that this is not my fault but a bug.

Description

Hi team,

I'm currently running Autoware with Carla, and I found a situation that the ego vehicle collides with a guardrail when the localization is bad. Here's a video on rviz: [rviz] and frontview: [frontview]

In the frontview video, the vehicle invades lane at the last cornering(3:15~), and it collides with a guardrail, after escaping the corner. Rviz video shows the last few seconds of the driving. At the start of the video, pointcloud and lanelet has some discrepancy due to a localization issue, and the ego vehicle drives and eventually makes a collision.

Here's a log file, but it does not contain any meaningful messages, except some warnings on localization: [launch.log]

Expected behavior

I think the vehicle should stop before the collision and it could stop by using the pointcloud data, even though the localization is bad.

Actual behavior

But the vehicle does not stop and makes collision, even though the pointcloud data saids there's an obstacle.

Steps to reproduce

Here's a ros2bag file to reproduce: [ros2bag]

Versions

Autoware: 20240115, prebuilt docker
Carla: 0.9.13
OS (host): Ubuntu 20.04
OS (docker): Ubuntu 22.04

Possible causes

According to RViz video, there's no virtual wall that saids an obstacle is at front. So, in my opinion, the issue has to do with obstacle_stop_planner.

I also echoed to the two topics relevant to the obstacle stop planner, but they do not publish any information regarding the obstacle.

# /planning/scenario_planning/lane_driving/motion_planning/obstacle_stop_planner/output/stop_reasons
level: "\0"
name: ''
message: ''
hardware_id: ''
values: []

# /planning/scenario_planning/lane_driving/motion_planning/obstacle_stop_planner/virtual_wall
markers:
- header:
    stamp:
      sec: 311
      nanosec: 899856438
    frame_id: ''
  ns: stop_virtual_wall
  id: 0
  type: 0
  action: 2
  pose:
    position:
      x: 0.0
      y: 0.0
      z: 0.0
    orientation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
  scale:
    x: 0.0
    y: 0.0
    z: 0.0
  color:
    r: 0.0
    g: 0.0
    b: 0.0
    a: 0.0
  lifetime:
    sec: 0
    nanosec: 0
  frame_locked: false
  points: []
  colors: []
  texture_resource: ''
  texture:
    header:
      stamp:
        sec: 0
        nanosec: 0
      frame_id: ''
    format: ''
    data: []
  uv_coordinates: []
  text: ''
  mesh_resource: ''
  mesh_file:
    filename: ''
    data: []
  mesh_use_embedded_materials: false
- header:
    stamp:
      sec: 311
      nanosec: 899856438
    frame_id: ''
  ns: stop_factor_text
  id: 0
  type: 0
  action: 2
  pose:
    position:
      x: 0.0
      y: 0.0
      z: 0.0
    orientation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
  scale:
    x: 0.0
    y: 0.0
    z: 0.0
  color:
    r: 0.0
    g: 0.0
    b: 0.0
    a: 0.0
  lifetime:
    sec: 0
    nanosec: 0
  frame_locked: false
  points: []
  colors: []
  texture_resource: ''
  texture:
    header:
      stamp:
        sec: 0
        nanosec: 0
      frame_id: ''
    format: ''
    data: []
  uv_coordinates: []
  text: ''
  mesh_resource: ''
  mesh_file:
    filename: ''
    data: []
  mesh_use_embedded_materials: false
- header:
    stamp:
      sec: 311
      nanosec: 899856438
    frame_id: ''
  ns: slow_down_virtual_wall
  id: 0
  type: 0
  action: 2
  pose:
    position:
      x: 0.0
      y: 0.0
      z: 0.0
    orientation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
  scale:
    x: 0.0
    y: 0.0
    z: 0.0
  color:
    r: 0.0
    g: 0.0
    b: 0.0
    a: 0.0
  lifetime:
    sec: 0
    nanosec: 0
  frame_locked: false
  points: []
  colors: []
  texture_resource: ''
  texture:
    header:
      stamp:
        sec: 0
        nanosec: 0
      frame_id: ''
    format: ''
    data: []
  uv_coordinates: []
  text: ''
  mesh_resource: ''
  mesh_file:
    filename: ''
    data: []
  mesh_use_embedded_materials: false
- header:
    stamp:
      sec: 311
      nanosec: 899856438
    frame_id: ''
  ns: slow_down_factor_text
  id: 0
  type: 0
  action: 2
  pose:
    position:
      x: 0.0
      y: 0.0
      z: 0.0
    orientation:
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0
  scale:
    x: 0.0
    y: 0.0
    z: 0.0
  color:
    r: 0.0
    g: 0.0
    b: 0.0
    a: 0.0
  lifetime:
    sec: 0
    nanosec: 0
  frame_locked: false
  points: []
  colors: []
  texture_resource: ''
  texture:
    header:
      stamp:
        sec: 0
        nanosec: 0
      frame_id: ''
    format: ''
    data: []
  uv_coordinates: []
  text: ''
  mesh_resource: ''
  mesh_file:
    filename: ''
    data: []
  mesh_use_embedded_materials: false

Additional context

No response

Mar 14 '24 09:03 Kim-mins

The default Autoware behavior does not stop for obstacle points but the obstacle_stop_planner can be used to stop in front of obstacle points: https://autowarefoundation.github.io/autoware.universe/main/planning/obstacle_stop_planner/

Mar 15 '24 04:03 maxime-clem

Thank you for the response @maxime-clem!

I think I didn't get your point. Could you please tell your point again?

Do you mean I should change some parameters to stop Autoware in the situation? I've checked that the default Autoware could stop behind the obstacle at front well, and I think the situation of this issue is not different with stopping behind the obstacle at front.

Thanks!

Mar 18 '24 02:03 Kim-mins

Sorry for not being so clear. Actually my previous comment was wrong. The default behavior of Autoware is to stop if there are points from /perception/obstacle_segmentation/pointcloud on the path. In your case, even if localization is wrong, the points from the guardrail should trigger a stop. Make sure the pointcloud is correctly published and is in base_link frame.

Mar 18 '24 03:03 maxime-clem

Thank you for the details @maxime-clem!

I echoed to /perception/obstacle_segmentation/pointcloud if the pointcloud data is published well, and I could get the like data below continuously, not readable though.. so it seems the pointcloud data is published well.

pointcloud data

header:
  stamp:
    sec: 88
    nanosec: 49853103
  frame_id: base_link
height: 1
width: 5579
fields:
- name: x
  offset: 0
  datatype: 7
  count: 1
- name: y
  offset: 4
  datatype: 7
  count: 1
- name: z
  offset: 8
  datatype: 7
  count: 1
is_bigendian: false
point_step: 16
row_step: 89264
data:
- 160
- 87
- 185
- 63
- 160
- 118
- 227
- 64
- 79
- 117
- 79
- 63
- 0
- 0
- 128
- 63
- 160
- 130
- 185
- 63
- 0
- 171
- 227
- 64
- 68
- 90
- 57
- 63
- 0
- 0
- 128
- 63
- 160
- 225
- 191
- 63
- 160
- 173
- 227
- 64
- 48
- 182
- 56
- 63
- 0
- 0
- 128
- 63
- 224
- 163
- 185
- 63
- 64
- 7
- 144
- 65
- 210
- 178
- 27
- 63
- 0
- 0
- 128
- 63
- 192
- 13
- 204
- 63
- 88
- 20
- 146
- 65
- 112
- 167
- 20
- 63
- 0
- 0
- 128
- 63
- 64
- 44
- 205
- 63
- 248
- 225
- 146
- 65
- 143
- 199
- 70
- 63
- 0
- 0
- 128
- 63
- 64
- 179
- 209
- 63
- 120
- 93
- 195
- 65
- 214
- 125
- 25
- 64
- 0
- 0
- 128
- 63
- 96
- 145
- 231
- 63
- 24
- 16
- 196
- 65
- 86
- 121
- 25
- 64
- 0
- 0
- 128
- 63
- '...'
is_dense: true

And I tried to check if the pointcloud data is in base_link frame, but I have no idea.. Could you please tell me how?

By the way, I found somewhat weird situation that, the pointcloud data is not perceived well for some area. Here's the video, and before the collision, some points of the guardrail near to the vehicle disappears, and appears after the vehicle goes far. (so the guardrail seems not connected as the image below.) But, since the disappeared points are behind the vehicle, so this could not be the root cause.. do you have any idea on this?

Mar 18 '24 04:03 Kim-mins

In the message you shared we can see the frame_id in the header of the message.

header:
  stamp:
    sec: 88
    nanosec: 49853103
  frame_id: base_link

some points of the guardrail near to the vehicle disappears, and appears after the vehicle goes far

This can be caused by a dead zone in the lidar, or by the pointcloud cropbox filter. But I do not think this is causing any issue here.

I cannot understand why in your case the obstacle_stop_planner does not trigger a stop. If you can share a bag of your situation I will try to analyze the issue.

Mar 18 '24 07:03 maxime-clem

Thank you for the kind explanation and opinion @maxime-clem! Now I understood.

For the ros2bag file, I already uploaded it for better reproduction. You can find it here.

Thanks!

Mar 18 '24 07:03 Kim-mins

Thank you for sharing your bag. I found the source of the issue. By default, the obstacle_stop_planner does some filtering on the obstacle pointcloud using the Z axis. However, in your case, the z axis is somehow stuck to ~0.0 but the obstacle points are elevated and thus ignored.

Here are two videos taken from the bag you shared showing the correct and incorrect update.

https://github.com/autowarefoundation/autoware.universe/assets/78338830/2a0b7f86-2034-4232-8211-5f8dd8c77750

https://github.com/autowarefoundation/autoware.universe/assets/78338830/1da3aa0f-d62b-4e98-862e-a5c9dfe80f76

As a workaround, you can set parameter enable_z_axis_obstacle_filtering to False.

I will ask someone else to investigate this issue since it is not related to planning but with localization.

May 09 '24 05:05 maxime-clem

Thank you for the detailed investigation @maxime-clem!!

I think I got your point. The current obstacle_stop_planner ignores obstacles irrelevant to the ego(e.g., z value differs a lot), but in this case the z value is fixed as ~0.0(don't know why), and the ego vehicle ignores the guardrail since there's enough difference between the estimated z value of ego and that of the guardrail, so they eventually collides since the estimated z value of the ego is wrong.

So.. I agree that this issue would be from localization.

I always appreciate to your help! Thank you!

May 09 '24 05:05 Kim-mins

Hi! I haven't looked at the data yet, but I noticed that the launch.log provided by the questioner contains many warnings indicating that the NDT's NVTL(Nearest Voxel Transformation Likelihood) score is falling below the threshold.

1708804488.0783958 [ndt_scan_matcher-33] [WARN] [1708804488.075854158] [localization.pose_estimator.ndt_scan_matcher]: Nearest Voxel Transformation Likelihood is below the threshold. Score: 1.288186, Threshold: 2.300000

When the NVTL is low, NDT doesn't output a position, so the height won't be updated because only odometry is used to estimate the position. I will check the rosbag to see why the NVTL is decreasing.

May 09 '24 05:05 KYabuuchi

Hi @KYabuuchi!!

For the reason that the localization is getting bad, I want to share my opinion(could be wrong, but I hope it to be helpful).

The map the ego vehicle is driving is Town04, which is truly large. It is something like highway, so there are many long monotonic(i.e., with no special surroundings) roads. According to my observation, when the ego vehicle drives through the long monotonic road with no special surroundings, the localization gets worse. (The ego vehicle thinks itself driving little bit relative behind to the real position.)

For this issue, the ego vehicle also ran the long monotonic road before running into the corner, so I think it would have made the localization bad.

May 09 '24 06:05 Kim-mins

@Kim-mins I agree with your analysis. Autoware's localization often struggles on monotonic roads. However, performance can often be improved with adjustments to some parameters.

One thing I noticed is that your rosbag had a /clock topic at 10Hz. The EKF ideally calls the timer callback at 50Hz, so a /clock frequency of 100Hz or higher is preferable. If possible, try increasing the frequency of the /clock and test again. :pray:

The log message [localization.pose_estimator.ndt_scan_matcher]: pose_buffer_.size() < 2 indicates that the EKF's timer callback is being called at a low frequency.

May 09 '24 09:05 KYabuuchi

I tried validating your rosbag in my environment, and I don't know why, but consistently successful localization was achieved.

I'll share the video with you.

https://github.com/autowarefoundation/autoware.universe/assets/24854875/4fed00e5-84df-43a7-bfb1-9a9732897e70

May 09 '24 09:05 KYabuuchi

Thank you for the investigation @KYabuuchi!

For the clock frequency, I'm limiting it to 20Hz due to the computational cost, since I'm running co-simulation now. I'll check if I can apply higher frequency.

I also tried my bag file, but I could observe that the ego vehicle drives under the road in rviz. Could you please check if the vehicle is under the road during the last cornering?

https://github.com/autowarefoundation/autoware.universe/assets/50267797/0a694926-5d2f-4cab-ab87-c9fd10839118

May 09 '24 11:05 Kim-mins

Hmm, in my environment, the estimated position consistently contacts with the ground. Also, the score and iteration_number of the ndt_scan_matcher were stable before and after the curve.

https://github.com/autowarefoundation/autoware.universe/assets/24854875/dc708fc8-940b-4686-a753-cb1c40e99abb

Here is my command for rosbag playback.

ros2 bag play .  --topics
         /clock
         /sensing/gnss/ublox/nav_sat_fix
         /sensing/imu/tamagawa/imu_raw
         /sensing/lidar/concatenated/pointcloud
         /vehicle/status/gear_status
         /vehicle/status/steering_status
         /vehicle/status/velocity_status

May 10 '24 01:05 KYabuuchi

When the clock frequency is low, the prediction frequency of the EKF becomes coarse, requiring the NDT scan matcher to start matching from a poor initial value. This can be challenging on monotonic roads. Additionally, as the frequency of prediction updates in the EKF decreases, the variance may become smaller than expected. In such cases, the observations by NDT may not be adequately reflected, leading to inaccurate estimation results.

Publishing the clock at a higher frequency, even if it does not reflect in the CARLA simulation, might improve the situation. (But I am not familiar with CARLA so I do not know if we can do it.)

May 10 '24 01:05 KYabuuchi

Thank you for checking again @KYabuuchi!

I also tried your command for playing ros2bag file, but the vehicle in rviz does not move at all. Could you please check the command again for me? For the video of my last comment, I used the command below:

ros2 bag play .

which just plays message from every topic.

For the clock frequency, I got one question. I'm running co-simulation with Carla, and I run it synchronously, that is, to my knowledge, the messages between Carla and Autoware are exchanged one by one. So I expected that, since the messages are exchanged in synchronized manner, even if the frequency is too low, the issue from low frequency would not happen at all. Could you please share your opinion on this?

Thanks!

May 10 '24 07:05 Kim-mins

Because the localization input topic was changed to /sensing/lidar/concatenated/pointcloud from /sensing/lidar/top/pointcloud about a month ago. https://github.com/autowarefoundation/autoware.universe/pull/6528 You are using the January version, so the required topic may be different.

ros2 bag play .  --topics
         /clock
         /sensing/gnss/ublox/nav_sat_fix
         /sensing/imu/tamagawa/imu_raw
         /sensing/lidar/top/pointcloud
         /vehicle/status/gear_status
         /vehicle/status/steering_status
         /vehicle/status/velocity_status

May 10 '24 09:05 KYabuuchi

Thank you @KYabuuchi!

Actually, I don't know well about the topics for localization, I could not play the bag file with your specific topics.

So, could you please play the ros2bag file with all the topics? When I play again with the command ros2 bag play ., I could observe that the vehicle goes under the road: [rviz]

May 10 '24 11:05 Kim-mins

@Kim-mins if you replay all topics from the bag, then the pose will be the one that was recorded in the bag where we observe the pose under the road at some points. What @KYabuuchi is doing is only replaying the sensor data (so the pose is not replayed) while running Autoware. This way Autoware was able to produce a correct localization pose. This means that the issue was probably a performance issue.

I am hopping that the new carla_autoware interface (see https://github.com/autowarefoundation/autoware.universe/pull/6859/) will provide better performance and fix these kind of issues.

May 10 '24 11:05 maxime-clem

Thank you for clarifying @maxime-clem! Now I could get the point!

So maybe the performance issue is the cause and I should try the new bridge to test if the issue remains.

For the new bridge, frankly speaking, I check #6859 every day, and I really look forward to the release since I do want to free from this kind of weird issue.. Is there any specific plan for the release of the bridge?

Thank you!

May 10 '24 13:05 Kim-mins

@Kim-mins As Maxime explained, I suggested to you to play only sensor data from rosbag and purely test localization.

For the clock frequency, I got one question. I'm running co-simulation with Carla, and I run it synchronously, that is, to my knowledge, the messages between Carla and Autoware are exchanged one by one. So I expected that, since the messages are exchanged in synchronized manner, even if the frequency is too low, the issue from low frequency would not happen at all. Could you please share your opinion on this?

When you run the simulation, I think you launch either logging_simulator.launch.xml or e2e_simulator.launch.xml . By starting in this way, the use_sim_time variable of all Autoware nodes becomes true, and they obtain the time from the /clock topic rather than the system clock.

Autoware contains several timer callbacks, and when use_sim_time is true, timer callbacks are driven by the subscriber callback of /clock. Roughly speaking, if /clock is at 10Hz, timer callbacks exceeding 10Hz will not function properly.

The prediction update of the EKF should ideally run at 50Hz, but since it is being called at 10Hz in this case, it is not performing as designed.

May 13 '24 04:05 KYabuuchi

I think we identified the issue (wrong z value because of localization failure), and possible workaround (set parameter enable_z_axis_obstacle_filtering to False). I am closing the issue but we can reopen it if needed.

Jun 19 '24 07:06 maxime-clem

autoware.universe autoware.universe copied to clipboard

Vehicle makes collision when localization is bad.

Checklist

Description

Expected behavior

Actual behavior

Steps to reproduce

Versions

Possible causes

Additional context

autoware.universe
autoware.universe copied to clipboard