mmdetection3d
mmdetection3d copied to clipboard
project_rect_to_image in mono3d
Hi, a common question about project_rect_to_image in monocular 3d.
In OpenPCDet, its implementation is LINE:75-84
def rect_to_img(self, pts_rect): """ :param pts_rect: (N, 3) :return pts_img: (N, 2) """ pts_rect_hom = self.cart_to_hom(pts_rect) pts_2d_hom = np.dot(pts_rect_hom, self.P2.T) pts_img = (pts_2d_hom[:, 0:2].T / pts_rect_hom[:, 2]).T # (N, 2) pts_rect_depth = pts_2d_hom[:, 2] - self.P2.T[3, 2] # depth in rect camera coord return pts_img, pts_rect_depth
but in mmdetection3d, its implementation is LINE:175-214
points_4 = torch.cat([points_3d, points_3d.new_ones(points_shape)], dim=-1) point_2d = points_4 @ proj_mat.T point_2d_res = point_2d[..., :2] / point_2d[..., 2:3] if with_depth: point_2d_res = torch.cat([point_2d_res, point_2d[..., 2:3]], dim=-1)
Because calib.P2[2,3] is not zero, so I think that maybe point_2d_res should be generated from point_2d[..., :2] / point_3d[..., 2:3], then add a point_2d[..., 2:3] -= proj_mat[2,3] or directly use point_3d[..., 2:3] when enable with_depth?
This function influences the base_centers2d and depths in mono3d related model, so is it should be modified?
Good suggestion, we will check the influence after modification on mono 3d model.
@ZCMax
Just now I do some simple numerical check.
A. sample: 000000.txt in kitti training split and its annotation as follows:
B. corresponding generated center2ds (project 3D box center onto image) kitti_infos_train_mono3d.coco.json:
C. in reimplemented SMOKECoder._decode_location, we print location:
Note that the decoded location3d is not aligned with the GT(1.84, 0.525, 8.41) and 0.525 comes from y3d - h3d/2. And all about trans_mat variables will be deprecated. Then I try add depths = depths + cam2imgs[obj_id][:,2,3] before centers2d_img = centers2d_img * depths_ref.view(N, -1, 1) LINE:147 for numerical correction.
We print location again and it's right:
Therefore, if the normalization issue in project_rect_to_image(points_cam2img) is not resolved, We can still correct at the decoding stage to recover the GT. Regressing with this erroneous GT has little impact in SMOKE, but performance is not guaranteed in other methods that need to predict variables related to center offset (e.g. base_center2d.round().int() and base_center2d.float() in MonoFlex).
The above is for reference only (^_^)
Same issue. I think it leads to shifts in visualizations of SMOKE predictions on images.
Same issue. I think it leads to shifts in visualizations of SMOKE predictions on images.
Sorry, the shift of gt is caused by unchanged cam intrinsic param in rescaling images