StreamPETR [问题] gt_bboxes_3d是如何由sample_annotation.json转换得到的？

我尝试去理解目标框的ground truth，于是把data loader加载的一个sample的gt_bboxes_3d保存了下来，然后画到相应的相机图片上。

保存gt_bboxes_3d的位置： petr3d.py#L223

画图的方法（我理解gt_bboxes_3d是lidar boxes 3d，是在sensor坐标系的，所以画到相机视角是要校准）：

box_list = []
for box in boxes_mine:
    # Move from censor coordinate to world coordinate
    box.rotate(Quaternion(lidar_rotation))
    box.translate(np.array(lidar_translation))

    box.rotate(Quaternion(pose_record['rotation']))
    box.translate(np.array(pose_record['translation']))

    if use_flat_vehicle_coordinates:
        # Move box to ego vehicle coord system parallel to world z plane.
        yaw = Quaternion(pose_record['rotation']).yaw_pitch_roll[0]
        box.translate(-np.array(pose_record['translation']))
        box.rotate(Quaternion(scalar=np.cos(yaw / 2), vector=[0, 0, np.sin(yaw / 2)]).inverse)
    else:
        # Move box to ego vehicle coord system.
        box.translate(-np.array(pose_record['translation']))
        box.rotate(Quaternion(pose_record['rotation']).inverse)

        #  Move box to sensor coord system.
        box.translate(-np.array(cs_record['translation']))
        box.rotate(Quaternion(cs_record['rotation']).inverse)

    if sensor_record['modality'] == 'camera' and not \
            box_in_image(box, cam_intrinsic, imsize, vis_level=box_vis_level):
        continue

但是画出来的结果，似乎存在对不齐的现象： b993550e60054741983f8052ba97b0b0_v13

为了厘清为什么对不齐，我检查了一下数据，发现了两个问题：

sample_annotation.json里的box是如何变成gt_bboxes_3d的？举例来说，有个annotation为{"translation": [1299.232, 918.868,1.568], "size": [2.908,10.909,4.454], "rotation": [0.9473110943049808,0.0,0.0,-0.3203149865471482]}，data loader得到的gt_bboxes_3d的是tensor([25.3633, 4.3085, -1.8269, 10.6079, 2.8277, 4.3311, -2.9905, 0.0000, 0.0000])，我按照所理解的calibration去画图是对不齐的
训练多个epochs时，同一个annotation对应的gt_bboxes_3d的值似乎在不停的变化，比如上面提到的annotation，第10个epochs时，gt_bboxes_3d变成了tensor([[26.6186, 7.3840, 0.3637, 11.3902, 3.0363, 4.6504, -2.8882, 0.0000, 0.0000]])。请问这是什么原因呢？

Apr 17 '24 06:04 LiuJieShane

[1299.232, 918.868,1.568], "size": [2.908,10.909,4.454], "rotation": [0.9473110943049808,0.0,0.0,-0.3203149865471482]这些annoatation在nuscenes的global坐标系，需要转换到mmdet3d的lidar坐标系
不断变化是因为加了BEV上的增广，去掉应该就不会变化了，但是mAOE指标会下降。https://github.com/exiawsh/StreamPETR/blob/2315cf9f077817ec7089c87094ba8a63f76c2acf/projects/configs/StreamPETR/stream_petr_r50_flash_704_bs2_seq_24e.py#L173

Apr 19 '24 08:04 exiawsh

[1299.232, 918.868,1.568], "size": [2.908,10.909,4.454], "rotation": [0.9473110943049808,0.0,0.0,-0.3203149865471482]这些annoatation在nuscenes的global坐标系，需要转换到mmdet3d的lidar坐标系

@exiawsh 具体的转换方法？我已经把gt_bboxes_3d从lidar坐标系转回ego坐标系，再转回global坐标系，画出来的框再BEV视角下对不齐（如图第一行所示）；画到相机图像上时，也已经先转到gloabl坐标系，再转到camera坐标系，还是对不齐（如图后三行所示）。

Apr 22 '24 01:04 LiuJieShane

StreamPETR StreamPETR copied to clipboard

[问题] gt_bboxes_3d是如何由sample_annotation.json转换得到的？

StreamPETR
StreamPETR copied to clipboard