nuscenes-devkit export_kitti.py - Camera and lidar do not have the same ego pose

ISSUE

Currently in export_kitti.py the following transformation is incorrect:

            lid_to_ego = transform_matrix(
                cs_record_lid["translation"], Quaternion(cs_record_lid["rotation"]), inverse=False
            )
            ego_to_cam = transform_matrix(
                cs_record_cam["translation"], Quaternion(cs_record_cam["rotation"]), inverse=True
            )
            velo_to_cam = np.dot(ego_to_cam, lid_to_ego)

Unlike nuscenes (which I didn't check, but I believe to be correct), the camera and lidar ego poses for this dataset are not the same. The effect of the code is above is that if you use the RGB camera images with projected labels from lidar the boxes will be randomly off by 10-20 pixels, which is problematic for any sort of 2D learning.

To correct this, two additional transformations are needed to convert to / from world pose for both lidar and camera.

Additionally, if I recall, the render function does not have the same issue as this KITTI converter.

Related PR: https://github.com/lyft/nuscenes-devkit/pull/75

Nov 13 '19 15:11 kyleyklee

Can confirm. In each sample, lidar and camera records refer to different ego_pose records. Those poses differ in rotation up to 1 degree and translation up to 0.5 m (I think those are meters), but timestamp is the same.

import math

my_scene = level5data.scene[0]
sample_token = my_scene["first_sample_token"]

while sample_token != "":
    sample = ex.get("sample", sample_token)
    sample_token = sample["next"]
    
    cam_sensor_token = sample["data"]["CAM_FRONT"]
    cam_sd_record = ex.get("sample_data", cam_sensor_token)
    cam_ep_record = ex.get("ego_pose", cam_sd_record["ego_pose_token"])
    
    lid_sensor_token = sample["data"]["LIDAR_TOP"]
    lid_sd_record = ex.get("sample_data", lid_sensor_token)
    lid_ep_record = ex.get("ego_pose", lid_sd_record["ego_pose_token"])
    
    cam_qua = Quaternion(cam_ep_record["rotation"])
    lid_qua = Quaternion(lid_ep_record["rotation"])
    diff_deg = (lid_qua.inverse * cam_qua).degrees
    
    diff_m = math.sqrt(sum((a-b)*(a-b) for a, b in zip(cam_ep_record["translation"], lid_ep_record["translation"])))
    
    diff_ts = abs(cam_ep_record["timestamp"] - lid_ep_record["timestamp"])
    
    print(round(diff_deg, 3), "deg", round(diff_m, 3), "m", diff_ts)

...
1.077 deg 0.25 m 0.0
1.095 deg 0.225 m 0.0
1.052 deg 0.242 m 0.0
0.953 deg 0.255 m 0.0
0.887 deg 0.256 m 0.0
0.869 deg 0.245 m 0.0
...

Nov 13 '19 23:11 megaserg

@megaserg In the example above, do you happen to know if the ego_pose_tokens are different? I would assume so because of the ::get("ego_pose", ...) call, but I'm just curious because I have recently looked for duplicate timestamps in the Lyft dataset and didn't find any. Perhaps either my debugging was wrong, or I have been ignoring the ego_pose timestamp in favor of the sensor timestamp.

If the tokens are distinct and yet have the same timestamp, then it sounds like the Lyft dataset itself might be "broken." (Could be patched through code though).

Nov 16 '19 07:11 pwais

@pwais yes, the ego_pose_token is different between the cameras and the lidars. The timestamps of the referred poses are the same though.

Nov 19 '19 01:11 megaserg

oof! thanks for confirming @megaserg

Nov 19 '19 06:11 pwais