Open3D icon indicating copy to clipboard operation
Open3D copied to clipboard

Rendering 2D images from VoxelBlockGrid matching TSDF input given only camera extrinsics

Open dendelyne opened this issue 3 years ago • 0 comments

Checklist

My Question

Dear all, I have a problem trying to finding an accurate way to generate 2D images from a TSDF VoxelBlockGrid that look identical to the input images when given only the camera extrinsics/intrinsics. I am basicaly trying to integrate a series of RGB-D images (+extrinsics) into one 3D model, just as was done in this example. I however additionally want to make a rendering of the next frame each step before integrating the next RGB-D image, only using the camera extrinsics.

I tried different methods, the best one I've found was to use a Visualizer and to capture a screen image. In order to get the camera in the right position, I set the cameras extrinsics and intrinsics of the view control and render the image. Here is the code for the rendering steps using the TSDF VoxelBlockGrid:

mesh = vbg.extract_triangle_mesh().to_legacy()

vis = o3d.visualization.Visualizer()
vis.create_window(width=640, height=480, visible=False)

vis.add_geometry(mesh)

depth_intrinsic_o3d = o3d.camera.PinholeCameraIntrinsic(width=width, height=height, fx=depth_int_mat[0,0], fy=depth_int_mat[1,1], cx=depth_int_mat[0,2], cy=depth_int_mat[1,2])
color_intrinsic_o3d = o3d.camera.PinholeCameraIntrinsic(width=width, height=height, fx=color_int_mat[0,0], fy=color_int_mat[1,1], cx=color_int_mat[0,2], cy=color_int_mat[1,2])

view_ctl = vis.get_view_control()
cam = view_ctl.convert_to_pinhole_camera_parameters()
cam.intrinsic = depth_intrinsic_o3d 
cam.extrinsic = extrinsic_mat 
view_ctl.convert_from_pinhole_camera_parameters(cam, True)

filename = "tsdf_fusion/o3d_renderings/rendering%04d.png"%(i)
vis.capture_screen_image(filename, do_render=True)
#vis.run()

vis.destroy_window()
vis.close()

I am using the ScanNet dataset and tested it with both provided depth and color intrinsics (which I also used for the integration). Both results are not a fully accurate reconstruction in terms of their perspective. Here a comparison: Original image: Using color intrinsics: Using depth intrinsics:

It looks like the camera positioning is accurate, so it might be possible to fix the issue by simply using the correct intrinsics.

So the first important question is: Is there a way to find the intrinsics needed to get the right perspective? After all, the TSDF algorithm is based on the two intrinsics provided, so there should be a way to determine which are the corresponding intrinsics in the virtual camera.

This of course is only one attempt, I tried other methods where the results were even worse. I have so far not looked into other 3D model generation methods other than TSDF. So if somebody has any better ideas: What better ways are there in terms of 3D integration/image rendering that might be better suited for this task?

In my case, it is unfortunately not possible to use the view frustum coordinates, as calculating them would require the use of the depth image in addition to the extrinsics.

I hope my question is clear and not too broad! I would appreciate any help, as I am stuck on this issue for a while now!

If any more information or code pieces is needed for clarification, I would be glad to provide them!

Thanks in advance!

dendelyne avatar Jul 30 '22 10:07 dendelyne