mmdetection3d icon indicating copy to clipboard operation
mmdetection3d copied to clipboard

project_rect_to_image in mono3d

Open excitohe opened this issue 3 years ago • 4 comments

Hi, a common question about project_rect_to_image in monocular 3d. In OpenPCDet, its implementation is LINE:75-84 def rect_to_img(self, pts_rect): """ :param pts_rect: (N, 3) :return pts_img: (N, 2) """ pts_rect_hom = self.cart_to_hom(pts_rect) pts_2d_hom = np.dot(pts_rect_hom, self.P2.T) pts_img = (pts_2d_hom[:, 0:2].T / pts_rect_hom[:, 2]).T # (N, 2) pts_rect_depth = pts_2d_hom[:, 2] - self.P2.T[3, 2] # depth in rect camera coord return pts_img, pts_rect_depth but in mmdetection3d, its implementation is LINE:175-214 points_4 = torch.cat([points_3d, points_3d.new_ones(points_shape)], dim=-1) point_2d = points_4 @ proj_mat.T point_2d_res = point_2d[..., :2] / point_2d[..., 2:3] if with_depth: point_2d_res = torch.cat([point_2d_res, point_2d[..., 2:3]], dim=-1) Because calib.P2[2,3] is not zero, so I think that maybe point_2d_res should be generated from point_2d[..., :2] / point_3d[..., 2:3], then add a point_2d[..., 2:3] -= proj_mat[2,3] or directly use point_3d[..., 2:3] when enable with_depth? This function influences the base_centers2d and depths in mono3d related model, so is it should be modified?

excitohe avatar Feb 23 '22 04:02 excitohe

Good suggestion, we will check the influence after modification on mono 3d model.

ZCMax avatar Feb 23 '22 04:02 ZCMax

@ZCMax Just now I do some simple numerical check. A. sample: 000000.txt in kitti training split and its annotation as follows: 屏幕快照 2022-02-23 下午3 35 22 B. corresponding generated center2ds (project 3D box center onto image) kitti_infos_train_mono3d.coco.json: 屏幕快照 2022-02-23 下午3 37 28 C. in reimplemented SMOKECoder._decode_location, we print location: 屏幕快照 2022-02-23 下午3 44 06 Note that the decoded location3d is not aligned with the GT(1.84, 0.525, 8.41) and 0.525 comes from y3d - h3d/2. And all about trans_mat variables will be deprecated. Then I try add depths = depths + cam2imgs[obj_id][:,2,3] before centers2d_img = centers2d_img * depths_ref.view(N, -1, 1) LINE:147 for numerical correction. We print location again and it's right: 屏幕快照 2022-02-23 下午3 42 59 Therefore, if the normalization issue in project_rect_to_image(points_cam2img) is not resolved, We can still correct at the decoding stage to recover the GT. Regressing with this erroneous GT has little impact in SMOKE, but performance is not guaranteed in other methods that need to predict variables related to center offset (e.g. base_center2d.round().int() and base_center2d.float() in MonoFlex).

The above is for reference only (^_^)

excitohe avatar Feb 23 '22 08:02 excitohe

Same issue. I think it leads to shifts in visualizations of SMOKE predictions on images.

zhyever avatar Apr 08 '22 01:04 zhyever

Same issue. I think it leads to shifts in visualizations of SMOKE predictions on images.

Sorry, the shift of gt is caused by unchanged cam intrinsic param in rescaling images

zhyever avatar Aug 31 '22 13:08 zhyever