Pytorch_Generalized_3D_Lane_Detection
Pytorch_Generalized_3D_Lane_Detection copied to clipboard
关于R_g2c计算方式的疑惑
tools/utils.py文件:
def homograpthy_g2im(cam_pitch, cam_height, K):
# transform top-view region to original image region
R_g2c = np.array([[1, 0, 0],
[0, np.cos(np.pi / 2 + cam_pitch), -np.sin(np.pi / 2 + cam_pitch)],
[0, np.sin(np.pi / 2 + cam_pitch), np.cos(np.pi / 2 + cam_pitch)]])
H_g2im = np.matmul(K, np.concatenate([R_g2c[:, 0:2], [[0], [cam_height], [0]]], 1))
return H_g2im
R_g2c是车体坐标系绕x轴旋转90+pitch度的矩阵可以理解, 为什么H_g2im却只取前两列呢?
GeoNet3D_ext.py文件 230行:
# homograph ground to camera
# H_g2cam = np.array([[1, 0, 0],
# [0, np.cos(np.pi / 2 + cam_pitch), args.cam_height],
# [0, np.sin(np.pi / 2 + cam_pitch), 0]])
H_g2cam = np.array([[1, 0, 0],
[0, np.sin(-cam_pitch), args.cam_height],
[0, np.cos(-cam_pitch), 0]])
这里H_g2cam其实是 np.concatenate([R_g2c[:, 0:2], [[0], [cam_height], [0]]], 1)的结果, 但是为什么又与上面的定义方式不同了呢?
self.H_ipm2g = cv2.getPerspectiveTransform(np.float32([[0, 0],
[self.ipm_w-1, 0],
[0, self.ipm_h-1],
[self.ipm_w-1, self.ipm_h-1]]),
np.float32(args.top_view_region))
为什么ipm到top-view的H要叫做H_ipm2g? 然后这个H_ipm2g竟然真的左乘H_g2im得到了H_ipm2im, 然后求其逆, 得到了H_im2ipm. 上面的代码求出来的不应改是H_ipm2gflat吗? top-view和3d ground几件不应该还差了一个几何关系吗?
def transform_lane_g2gflat(h_cam, X_g, Y_g, Z_g):
"""
Given X coordinates in flat ground space, Y coordinates in flat ground space, and Z coordinates in real 3D ground space
with projection matrix from 3D ground to flat ground, compute real 3D coordinates X, Y in 3D ground space.
:param P_g2gflat: a 3 X 4 matrix transforms lane form 3d ground x,y,z to flat ground x, y
:param X_gflat: X coordinates in flat ground space
:param Y_gflat: Y coordinates in flat ground space
:param Z_g: Z coordinates in real 3D ground space
:return:
"""
X_gflat = X_g * h_cam / (h_cam - Z_g)
Y_gflat = Y_g * h_cam / (h_cam - Z_g)
return X_gflat, Y_gflat
tools/utils.py文件:
def homograpthy_g2im(cam_pitch, cam_height, K): # transform top-view region to original image region R_g2c = np.array([[1, 0, 0], [0, np.cos(np.pi / 2 + cam_pitch), -np.sin(np.pi / 2 + cam_pitch)], [0, np.sin(np.pi / 2 + cam_pitch), np.cos(np.pi / 2 + cam_pitch)]]) H_g2im = np.matmul(K, np.concatenate([R_g2c[:, 0:2], [[0], [cam_height], [0]]], 1)) return H_g2im
R_g2c是车体坐标系绕x轴旋转90+pitch度的矩阵可以理解, 为什么H_g2im却只取前两列呢?
GeoNet3D_ext.py文件 230行:
# homograph ground to camera # H_g2cam = np.array([[1, 0, 0], # [0, np.cos(np.pi / 2 + cam_pitch), args.cam_height], # [0, np.sin(np.pi / 2 + cam_pitch), 0]]) H_g2cam = np.array([[1, 0, 0], [0, np.sin(-cam_pitch), args.cam_height], [0, np.cos(-cam_pitch), 0]])
这里H_g2cam其实是 np.concatenate([R_g2c[:, 0:2], [[0], [cam_height], [0]]], 1)的结果, 但是为什么又与上面的定义方式不同了呢?
这里可以理解为地平面坐标系到相机坐标系的转换,而地平面坐标系是地面坐标系的xoy平面,上面的点的z坐标都为0,R * [x, y, 0].T + [0, h, 0].T 等价于 [R[:, :2], [0, h, 0].T] * [x, y, 1].T
R_g2c = np.array([[1, 0, 0],
[0, np.cos(np.pi / 2 + cam_pitch), -np.sin(np.pi / 2 + cam_pitch)],
[0, np.sin(np.pi / 2 + cam_pitch), np.cos(np.pi / 2 + cam_pitch)]])
为什么[1,2]是sin而不是-sin呢
Why i think that H_g2im should be
H_g2im = np.matmul(K, np.concatenate([R_g2c[:, 0:2], [[0], [cam_height * np.cos(cam_pitch)], [cam_height * np.sin(cam_pitch)]]], 1))
Is there something wrong?
tools/utils.py文件:
def homograpthy_g2im(cam_pitch, cam_height, K): # transform top-view region to original image region R_g2c = np.array([[1, 0, 0], [0, np.cos(np.pi / 2 + cam_pitch), -np.sin(np.pi / 2 + cam_pitch)], [0, np.sin(np.pi / 2 + cam_pitch), np.cos(np.pi / 2 + cam_pitch)]]) H_g2im = np.matmul(K, np.concatenate([R_g2c[:, 0:2], [[0], [cam_height], [0]]], 1)) return H_g2im
R_g2c是车体坐标系绕x轴旋转90+pitch度的矩阵可以理解, 为什么H_g2im却只取前两列呢? GeoNet3D_ext.py文件 230行:
# homograph ground to camera # H_g2cam = np.array([[1, 0, 0], # [0, np.cos(np.pi / 2 + cam_pitch), args.cam_height], # [0, np.sin(np.pi / 2 + cam_pitch), 0]]) H_g2cam = np.array([[1, 0, 0], [0, np.sin(-cam_pitch), args.cam_height], [0, np.cos(-cam_pitch), 0]])
这里H_g2cam其实是 np.concatenate([R_g2c[:, 0:2], [[0], [cam_height], [0]]], 1)的结果, 但是为什么又与上面的定义方式不同了呢?
这里可以理解为地平面坐标系到相机坐标系的转换,而地平面坐标系是地面坐标系的xoy平面,上面的点的z坐标都为0,R * [x, y, 0].T + [0, h, 0].T 等价于 [R[:, :2], [0, h, 0].T] * [x, y, 1].T
感觉这里还是诲涩难懂,相机坐标系怎么定义,地面坐标系怎么定义,应该说清楚,和文章里的示意图也不一致