Objectron
Objectron copied to clipboard
Projecting detected planes into image coordinates
Hi, thanks for the cool dataset!
I have been tinkering with objectron-geometry-tutorial.ipynb
, exploring the available meta-data. I haven't been able to successfully transform the extracted planes into image space for visualization. I tried using the same procedure by which the bounding box coordinates are projected into image pixels, but that doesn't seem to have worked since I have many unreasonable values, e.g. values that are negative or much larger than image bounds.
Here's the code that I used:
plane_points = np.array([[v.x,v.y,v.z,1] for v in plane.geometry.vertices])
plane_points_3d_world = transform @ plane_points.T
plane_points_3d_cam = frame_view_matrix @ plane_points_3d_world
plane_points_2d_proj = frame_projection_matrix @ plane_points_3d_cam
plane_points2d_ndc = plane_points_2d_proj[:-1, :] / plane_points_2d_proj[-1, :]
plane_points2d_ndc = plane_points2d_ndc.T
x = plane_points2d_ndc[:, 1]
y = plane_points2d_ndc[:, 0]
plane_points2d = np.copy(plane_points2d_ndc)
plane_points2d[:, 0] = ((1 + x) * 0.5) * width
plane_points2d[:, 1] = ((1 + y) * 0.5) * height
plane_points2d = np.round(plane_points2d).astype(np.int32)
for point_id in range(plane_points2d.shape[0]):
cv2.circle(image, (plane_points2d[point_id, 0], plane_points2d[point_id, 1]), 25, (0, 255, 255), -1)
Also, there's a small bug in the notebook in the definition of grab_frame
. The line
current_frame = np.frombuffer(
pipe.stdout.read(frame_size), dtype='uint8').reshape(width, height, 3)
has width
and height
transposed.
Thanks for any help you can provide!
Your code seems correct to me. The planes are estimated by the AR tracking system in 3D, across multiple previous frames. So they are not limited just to the current frames and they might be out of the boundaries, or even behind the camera (thus negative values). Also I would trust the planes in the later frames in the video more, as the tracking system had more time to refine those planes. You can get more information about plane geometry from this reference.
It is easier to visualize it in 3D:
If you want to get plane points visible in the camera, you need to create a grid from the plane polygons and for each point on the grid, project it and check if it is visible in the image.
Also thanks for the bug-report, currently the bike videos have an issue where the portrait mode of the video is not properly detected by ffmpeg. I will fix it in the next update.