pytorch3d icon indicating copy to clipboard operation
pytorch3d copied to clipboard

Unexpected behaviors on "Camera from opencv projection" due to non square image size

Open ColleenKuang opened this issue 3 years ago • 0 comments

🐛 Bugs / Unexpected behaviors

Hi, I recently use a customized camera intrinsic matrix, whose the focal length and the principal point are in the screen pixel unit, to render the object image. The code works fine in the square size image, but it conducts the unexpected result when I set the image size into a non-square size.

Here's my config of intrinsic matrix K and extrinsic matrix R|t, the principal point is the

image_size = (1080,1920)
focal = 512
principal_point = (540,960) # the center of image
K:
tensor([[[ 512,    0.,  540.],
         [   0., -512.,  960.],
         [   0.,    0.,    1.]]], device='cuda:0')
R:
tensor([[[1., 0., 0.],
         [0., 1., 0.],
         [0., 0., 1.]]], device='cuda:0')
T:
tensor([[0.00000000, 0.00000000, 1.07438493]], device='cuda:0')

I expected that the object should be displayed in the middle of the image, but I got this result. v_bug the same thing happend when I set the image size into (1920,1080) h_bug

Code

Then, I check the function _cameras_from_opencv_projection in the pytorch3d/renderer/camera_conversions.py

    focal_length = torch.stack([camera_matrix[:, 0, 0], camera_matrix[:, 1, 1]], dim=-1)
    principal_point = camera_matrix[:, :2, 2]

    # Retype the image_size correctly and flip to width, height.
    image_size_wh = image_size.to(R).flip(dims=(1,))

    # Screen to NDC conversion:
    # For non square images, we scale the points such that smallest side
    # has range [-1, 1] and the largest side has range [-u, u], with u > 1.
    # This convention is consistent with the PyTorch3D renderer, as well as
    # the transformation function `get_ndc_to_screen_transform`.
    scale = image_size_wh.to(R).min(dim=1, keepdim=True)[0] / 2.0
    scale = scale.expand(-1, 2)
    c0 = image_size_wh / 2.0

    # Get the PyTorch3D focal length and principal point.
    focal_pytorch3d = focal_length / scale
    p0_pytorch3d = -(principal_point - c0) / scale

I wonder why only flipping the image size(the line 5) without flipping the principal point coordinate in the same time. In my opinion, the change of the screen coordinate should cause the difference coordinate represent of the principal point so I add one line below the line 5

 # Retype the image_size correctly and flip to width, height.
    image_size_wh = image_size.to(R).flip(dims=(1,))
    principal_point = principal_point.flip(dims=(1,))

Finally, I got what I expected horizontal_1080p

ColleenKuang avatar Aug 18 '22 07:08 ColleenKuang