pytorch3d icon indicating copy to clipboard operation
pytorch3d copied to clipboard

PyTorch3D 2.0 feature: There is no spoon (only world coordinates)

Open legel opened this issue 2 years ago • 6 comments

🚀 Feature

The 2.0 version of this API should allow users to get started with rendering, with only an understanding of world coordinates.

Motivation

It's difficult and time-consuming for users to learn everything needed to start rendering with the experimental API. The API requires that users read documentation on 4 coordinate systems and understand competing notations (e.g. OpenGL vs. PyTorch3D, right-handedness vs. left-handedness). For users trying to render with their own camera extrinsics and intrinsics matrices, it seems they're left to basically guess-and-check with code how to properly align PyTorch3D's 4 coordinate systems with their own. There are several API calls about transforming points across different coordinate systems, which all seem useful under the hood, but not critical for a new user to understand in order to get started with rendering in PyTorch3D. This seems especially true if the user has a common and easy-to-fix conflict in notation / coordinate systems, for which PyTorch3D developers currently require the user to educate themselves.

Pitch

The pitch is simple: let users import points in 3D world coordinates, with camera extrinsics + intrinsics (e.g. from photogrammetry) that is known to have photographed / projected those coordinates, and then -- here is where PyTorch3D developers can really help -- help the user by heuristically searching, identifying, and calibrating the best way to join all the coordinate systems together. One software coordinate system manager to rule them all. Leave it to the advanced users to learn about rasterization; don't require it. The goal here is to get a user rendering as fast as possible. Imagine, PyTorch3D developer, verbally walking a user through the possible checks and fixes they might do, to get started in importing their own CV work into your library (e.g. "try converting to NDC and inverting the x-axis of the camera rotation..."): could you automate that? The success metric of this API is that it can seamlessly import camera matrices from many common packages and "just work".

Example code below with files available here: render_data.zip

import numpy as np

# load data (attached)
camera_extrinsics = np.load(file="camera_extrinsics.npy") # (4,4) translations, rotations
camera_intrinsics = np.load(file="camera_intrinsics.npy") # (3,3) focal lengths, principal points
rgb = torch.tensor(np.load(file="rgb.npy")) # (H,W,3) colors for every image pixel + (x,y,z) point
xyz = torch.tensor(np.load(file="xyz.npy")) # (H,W,3)  world (x,y,z) points for every image pixel

# load a camera with the camera extrinsics and intrinsics
camera = PerspectiveCameras(R=camera_extrinsics[:3, :3], 
                            T=camera_extrinsics[:3, 3]
                            image_size=(rgb.shape[0], rgb.shape[1]), 
                            principal_point=(camera_intrinsics[0,2], camera_intrinsics[1,2]),  
                            focal_length=(camera_intrinsics[0,0], camera_intrinsics[1,1])
                            )

# proposed feature: calibration function for inference of the most likely coordinate conventions of camera T and R
camera.calibrate_based_on_overlap_with_projected_points(points=xyz, colors=rgb)

# proposed: rendering function which by default adds ambient lights, rasterizes, etc. under the hood
camera.render(points=xyz, colors=rgb) 

Make rendering easy again.

legel avatar Jul 12 '22 01:07 legel

To elaborate further on the desired functionality, while also potentially troubleshooting my current problem, the (x,y,z) points that I have were generated with the following code (slightly modified for clarity):


# meshgrid of coordinates for every image pixel
cam_coord_y, cam_coord_x = torch.meshgrid(torch.arange(img_height), torch.arange(img_width), indexing='ij') 

# computation of image directions per pixel in camera coordinates
cam_directions_x = (cam_coord_x - principal_point_x) / focal_length_x  
cam_directions_y = (cam_coord_y - principal_point_y) / focal_length_y  
cam_directions_z = torch.ones(img_height, img_width)
cam_directions_xyz = torch.stack([cam_directions_x, cam_directions_y, cam_directions_z], dim=-1)

# multiply the per-pixel camera directions by the global camera rotation matrix
pixel_world_directions = torch.matmul(camera_world_rotation, cam_directions_xyz).squeeze(3) 

# compute world (x,y,z) coordinates by adding camera position to the world directions x depth-projected pixels
global_xyz = camera_world_position + pixel_world_directions * pixel_depths.unsqueeze(2)

The reason why I am struggling is because I have a large set of camera extrinsics and intrinsics defined by some other software, in which only the above code (with its own conventions of camera coordinates) has successfully projected all camera pixels into the same world space.

As of right now, I'm not sure how to edit the above code for projections, or if I need to edit my camera extrinsics (e.g. transform the rotation matrices) to be compatible with PyTorch3D conventions. I've read and reread the documentation on coordinate transformation conventions, but from that do not have a clear prescription. I put this issue into a Feature Request because, with a slightly better understanding of the problem, I could absolutely imagine automating a search-and-rescue solution. I imagine hundreds if not thousands of people face this exact problem: having to figure out what coordinate system conventions are being used, and then having to figure out how to "fix" their own xyz world coordinates, camera projections equations, and/or extrinsics/intrinsics.

Maybe some sort of a coordinate convention debugging / visualization tool could help folks coming from different communities get started (which could eventually lead to a clear obvious automated solution)?

legel avatar Jul 13 '22 04:07 legel

There is no spoon, only world coordinates, seems simple enough.

legel avatar Jul 13 '22 04:07 legel

I think there might be scope for extra visualisations to help with cases like this where there is uncertainty about conversions.

Our existing plotly integration can help a bit: If you are in jupyter (or similar) and have a length-1 PyTorch3D cameras object and a length-1 Meshes object, you can call

from pytorch3d.vis.plotly_vis import plot_batch_individually
plot_batch_individually([cameras, meshes])

to see the relative positions of the two in our world space in an interactive plot, which may help see what's wrong. An enhancement would be good to add a proper visual indication of the view frustrum or clipping planes to this plot. You may be able to think of other enhancements we could make.

bottler avatar Jul 13 '22 14:07 bottler

basic CV and CG are prerequisites, but reading, using PyTorch3D and look the implementations, is also a nice way to obtain those knowledge like coordinates, rasterization, camera intrinsic and extrinsic transformation, meshes and geometry, .etc, for someone learning the computer vision and computer graphics and actually see how this pipeline works. for visualization I often use opencv-python and it's strong enough to some extent? I am wondering that the tool may not so flexible. :-/

yougrianes avatar Jul 14 '22 02:07 yougrianes

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Aug 14 '22 05:08 github-actions[bot]

Thanks @bottler for your reply on this. I've tried plotting like you suggest, but still was not able to figure out how to align the different coordinate systems, so far. However I am still determined to do so and will share update on this in any case.

legel avatar Aug 15 '22 03:08 legel