3D-BoundingBox
3D-BoundingBox copied to clipboard
Visualize the 3D coordinate
Hello I am very new in this research. How can we draw the 3D bounding boxes? I have read the paper and still don't understand how to map from our 3D coordinate to the image which is 2D? From your code we can find dimension, center point, also angle, but how can we draw it to the image using opencv?
-Thank You-
If you want to project 3D point to 2D image, you will need camera intrisic which is provided by KITTI. But I didn't include the infomation in my code. The label file you download from KITTI will give you K[R|T]. You can use the KITTI parser of PyDriver to help you parse the label file.
Hello @fuenwang sorry for several days I still confuse how to get 8 point of coordinate. Firstly we have orientation
from the model what is this variable contains?, we also have so many angle like Ray
, ThetaRay
, and LocalAngle
, then what our model predict exactly?
Suppose we have an orientation
then what is the relation between our predicted orientation
with the R
in K[R|T]
? as far as I know R
is geometric rotation matrix which is [[cos(x),sin(x)],[sin(x), cos(x)]]
then what is x
in here?, is it the LocalAngle
?
Where can I see the KITTI's label mean?, they have 15 element where the first is the class, then we have 14 other number in there and I can't find any reference about what is this number exactly.
-Thank you-
-
According to the paper, there are two prediction from our model. The first is the revised angle for each bin of the circle (orient), the second one is the confidence of each bin. So if the orient is 20 degree and the corresponding bin is 120 degree, then the localangle(theta_l in Fig 3) will be 120+20=40 degree.
-
K[R|T] is for camera itself, not the object. K is the intrinsic, R and T is the rotation (3 DoG) and translation (3 DoG) for camera, which is provided by KITTI. To get the translation of object, you need solve all combination with relation in Section 3.2, but I'm not finishing this part.
-
For the description, you can download the develop kit
The readme file has a full description.
Thank you so much @fuenwang very brief explanation. Another question as we know we need the ROI of object, Camera Intrinsic, dimension, and also location to get the final 3D boundingBox. In the model itself we just input cropped picture then we can get dimension, orientation, and also the confidence. For the Camera Intrinsic in this implementation I need it from the Kitti atcalib
folder, but Where can we get the center location (T
in the equation) in the model predicted result?
According to the KITTI documentation here, The P2 of files in calib/ is the projection matrix of left color camera, which dimension is 3x4. This projection matrix is K[R|T] and I think it's what you looking for?
Hello @fuenwang Thank you for your explanation. I have read the KITTI documentation carefully, I also looks another code from another github repository. I still confuse about many things. Let me met summary here as follows:
*OUR GOAL
Only detect 3D Bounding box, let me ignore 2D Bounding box, also class.
*WHAT WE HAVE
First we have K which is camera intrinsic, can be found in file calib in P2 section
2D Bounding box in labels from index 4 to 7
Image in image folder
*WANT TO COMPUTE
8 Points of 3D bounding box
*WHAT WE NEED
Caliberation -> provided
2D bounding box -> provided
cropped image before VGG19-> Provided
Rotation (R) ->Unknown
Dimension -> Unknown
Location / Center/ T in K[RT] ->Unknown
*WHAT MODEL PREDICT
Orientation -> get Theta
Dimension -> Exactly we need
Confidence -> Not sure is it class of object or The multibins and confuse what it look like
Then where is the Location/Center/T in K[RT]?
*WHAT I SEE
Most of people ignore this, they just use the labels from index 11 to 13. Another parameter is Ry which is index 14 in label, but we can find it using orientation that predicted by model.
*THEREFORE
Suppose I want have a program to regress 3D Bbox and I have Object detection method let said YOLO, Faster RCNN, ETC. Then I need to crop image in 2D BBox area. I also compute camera calliberation, so the last thing I need is the Location/ Center/ T in K[RT]. Where can I find it and how?
*ACCORDING TO KITTI
We need to find dimension and also Location.
Sorry for long post, and thank you so much for teaching me. Please correct me if something wrong in there.
-
For the confidence our model predict is try to know the confidence which bin the theta is actually located at. So the dimension will be [batch_size x num of bins], which is applied softmax to get probability.
-
I think the location is the labels from index 11 to 13, which is already in camera coordinate. Because our model will provide rotation, we have R now. To get the translation, we can consider the equation [0 0 0]^T = R * location^T + T, then T = -R * location^T.
@fuenwang Hello, I recently ran you the code, I have a few questions to ask you, first of all, I want to know you this piece of code is the final output only shows the "Angle error" and "Dimension error" two results, that I want to see in the 2 d image generated 3 d bounding box where the coordinates of the point should be check?, then I see the article on 2 d image exactly on the generated 3 d bounding box, so how do I make 3 d visual display box? This problem has been bothering me for a long time, I mainly want to get is actually want to 3 d bounding box in the image coordinates of points and can show 3 d box in the image above.Since I am a novice, I would like to ask you to help me out. Thank you very much.
@herleeyandi ,Hello, I would like to ask you a few questions, does the output theata Angle in the code refer to the global direction Angle in the article?And the index 14 parameter of label in the data set is the ground true of the object global direction Angle predicted by the paper, right?
@tingZ123 I just saw this issue now haha.
In my eval code, I only eval the angle error and dimension error. If you want to obtain the 3D box, you have to calculate the object translation as described in paper Sec 3.2 . But I didn't implement this part in this repo.
The theta here(https://github.com/fuenwang/3D-BoundingBox/blob/master/Eval.py#L76) should be the global angle. The 14 index parameter is the ground truth global direction.
@fuenwang I am honored to receive your reply.However, I have a few questions.
1、The Angle error output of code is the difference between theta and Ry, and Ry is The 14 index parameter value corresponding to label.Isn't it? theta is between the direction of the object and the X-axis in the camera coordinate system, right?
2、In the code:
theta = np.arctan2(sin, cos) / np.pi * 180 ;
theta = theta + centerAngle[argmax] / np.pi * 180;
theta = 360 - info['ThetaRay'] - theta ;
the second theta is Local orientation angles in the article?and the third theta is Global orientation angles in the article,right?
3、Is the physical meaning of ThetaRay in the code the same as the θray in the article?
4、In your eval code,If I want to input one image ,and see eval the angle error and dimension error of each objects in the picture. waht should I do ?
For Q1,2,3, that is all correct.
For Q4: You can print the Variable "batch"(https://github.com/fuenwang/3D-BoundingBox/blob/master/Eval.py#L56), the dimension will be [bs, ch, 224, 224] just change it to the image you want to input.
@fuenwang 1、 In this paper, the θray refers to the Angle between the observed direction and the X-axis,in the codehttps://github.com/fuenwang/3D-BoundingBox/blob/master/Library/Dataset.py#L32,It has the same physical meaning, right? 2、I actually want to input a specific picture and then output the global orientation Angle(theta) and dimension(dim) of each object in the picture. How should I modify it?
- Yes, that is the same meaning.
- chane the batch input image (https://github.com/fuenwang/3D-BoundingBox/blob/master/Eval.py#L56) and then the theta of this line is the global angle (https://github.com/fuenwang/3D-BoundingBox/blob/master/Eval.py#L76)
@fuenwang Thank you very much for your answer. I have a general idea of how to do it. but I also need to ask you, Is the fourth parameter "alpha" in the label of 2D detection data set and ThetaRay in the code an Angle?
I use the following code to plot 3D box
def plot_3d_bbox(img, label_info):
alpha = label_info['alpha']
# theta_ray = label_info['theta_ray']
box_3d = []
center = label_info['location']
dims = label_info['dimension']
calib = label_info['calib']
cam_to_img = calib['P2']
rot_y = alpha / 180 * np.pi + np.arctan(center[0]/center[2])
# import pdb; pdb.set_trace()
for i in [1,-1]:
for j in [1,-1]:
for k in [0,1]:
point = np.copy(center)
point[0] = center[0] + i * dims[1]/2 * np.cos(-rot_y+np.pi/2) + (j*i) * dims[2]/2 * np.cos(-rot_y)
point[2] = center[2] + i * dims[1]/2 * np.sin(-rot_y+np.pi/2) + (j*i) * dims[2]/2 * np.sin(-rot_y)
point[1] = center[1] - k * dims[0]
point = np.append(point, 1)
point = np.dot(cam_to_img, point)
point = point[:2]/point[2]
point = point.astype(np.int16)
box_3d.append(point)
front_mark = []
for i in range(4):
point_1_ = box_3d[2*i]
point_2_ = box_3d[2*i+1]
cv2.line(img, (point_1_[0], point_1_[1]), (point_2_[0], point_2_[1]), (255,0,0), 2)
if i == 0 or i == 3:
front_mark.append((point_1_[0], point_1_[1]))
front_mark.append((point_2_[0], point_2_[1]))
cv2.line(img, front_mark[0], front_mark[-1], (255,0,0), 2)
cv2.line(img, front_mark[1], front_mark[2], (255,0,0), 2)
for i in range(8):
point_1_ = box_3d[i]
point_2_ = box_3d[(i+2)%8]
cv2.line(img, (point_1_[0], point_1_[1]), (point_2_[0], point_2_[1]), (255,0,255), 2)
return img
Hope it helps!
@tonysy Despite of the model predict. If I provide the ground truth label and calib file, can you draw out the 3D box? I dont think you can with this scripts.
Truck 0.00 0 -1.57 599.41 156.40 629.75 189.25 2.85 2.63 12.34 0.47 1.49 69.44 -1.56
Car 0.00 0 1.85 387.63 181.54 423.81 203.12 1.67 1.87 3.69 -16.53 2.39 58.49 1.57
This is groud truth label in KITTI. So, what is alpha
and theta_ray
? Which one is your centers? If you using the centers of groud truth you are already knowing where 3d box is.
As a brief conclusion, you just can not do a 3d bouding box predict with 2d predicted box and this network prediction.
This is what got...... using the groud truth, what is wrong?
Alpha is the observation angle but we don't need to use it. For theta_ray, you can choose the center location of cropped 2d box (x, y). And then you can infer theta_ray, this is also described is this fig
But during my training, I directly use ground truth to infer theta_ray.
@fuenwang Thanks so much. After edit, I directly using roration_y in label rather than alpha, then got right result now:
But would like to let me ask a few questions?
-
In groud truth, we using rotation_y and centers dimensions and calib, those 4 params enough to get a 3d box and project it on image. the model how to inference out the rotation value? And what this roration_y actually be?
-
those centers are using directly in labels are: 3D coordinate in camera but we just got dimension, not the centers, how to get centers?
-
rotation_y is the theta(red) which is global angle of objects. And the theta in this line https://github.com/fuenwang/3D-BoundingBox/blob/master/Eval.py#L72 is the predicted rotation_y.
-
The center of cropped image may not go through the center of 3D box, it is just an estimation but will have some small error. But we still need to have this assumption to infer translation. After obtaining translation, we can convert to object 3D center. So for now we have object rotation, dimension, 2D cropped image bounding box the we can optimize an object location by the three information as described in Sec. 3.2 of paper.
@fuenwang thanks.
What I means is:
the centers we using for calculate rotation_y is 3 dimension. If you using center of bounding box, it is 2d. where to get the needed 3d centers?
Yes, we only have 2D location of 2D bounding box. So we have to estimate a 3D location of object by projecting 3D location to 2D location which is constraint by the coordinate of the 2D box. Because we have a fixed dimension 3d box and its rotation, so theoretically the object location can also be inferred. And the whole Sec 3.2 tells us how to do it.
@fuenwang I saw the codes, you wanna get rotation, you must know the centers of 3d coordinate in camera world coordinate system. But you just got the dimensions, and a value of alpha, what confused me is that:
- another value using for what? never seen them use;
- 3d centers can be infered using 2d center map it to 3d by image to cam calibration. Then we do not need another value at all
@herleeyandi Hi How can you get the center point ,I only find the orient,conf and dim in the eval.py
@yuyijie1995 "3d centers can be infered using 2d center map it to 3d by image to cam calibration" since 2d center and camera calibration info is provided so I guess you can retrieve this info directly.
How can we get the 3D center ? Sure, we can assume the center of the 2D bbox will roughly be the center of the 3D bbox, but what's the depth value with whom we backproject the 2D bbox center to 3D coordinate ?
author of this repo told us part 3.2 in the paper help us calculate this information. As I understand to retrieve the 3D center coordinate we need to calculate T based on R, 2D center coordinates and K (intrinsic matrix) (formula is provided in Supplementary Material). However, I'm still not sure how to do this.
@jinfagang I am also confused with drawing the 3D bounding box with the ground truth, could you introduce your method in details or show your code for it?
@tjufan You need to download calibration file, label and left image file.
``