kubric icon indicating copy to clipboard operation
kubric copied to clipboard

Negative values in the intrinsics matrix generated in the MOVI datasets

Open andrewsonga opened this issue 1 year ago • 1 comments

Hello,

I've been playing around with the Movi dataset, and I found something odd about the intrinsics matrix:

K = [[1.09375, 0.0, -0.5],
        [0.0, -1.09375, -0.5],
        [0.0, 0.0, -1.0]]

AFAIK the intrinsics matrix should have the form of

K = [[fx, 0.0, -cx],
        [0.0, fy, -cy],
        [0.0, 0.0, 1.0]]

where fx, fy, cx, cy > 0. What does it mean when the intrinsics matrix has negative values i.e. entries K_11 = -1.09375 and K_22 = -1.0?

Thank you in advance!

andrewsonga avatar Aug 07 '24 00:08 andrewsonga

@andrewsonga Hi, I encountered the same issue when projecting the world coordinate system into the camera coordinate system. Have you found a solution yet?

zhangzjjjjjj avatar Sep 21 '24 09:09 zhangzjjjjjj

Hi, @andrewsonga @zhangzjjjjjj

While I'm not one of the official authors, I've looked into this and can explain it.

The short answer is that the negative values in the intrinsics matrix are a result of Kubric using a camera coordinate system as OpenGL, rather than the one typically used in OpenCV.

Here is a more detailed breakdown:

1. The OpenCV Camera Convention (The Baseline)

In the standard OpenCV convention, the camera coordinate system is defined as:

  • +X axis points to the right.
  • +Y axis points down.
  • +Z axis points forward (into the scene).

The projection equation from camera coordinates $(X, Y, Z)$ to pixel coordinates $(u, v)$ is:

$$ Z\begin{pmatrix} u \ v \ 1 \end{pmatrix} = \mathbf{K} \begin{pmatrix} X \ Y \ Z \end{pmatrix} = \begin{pmatrix} f_x & 0 & c_x \ 0 & f_y & c_y \ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} X \ Y \ Z \end{pmatrix} $$

Here, all parameters ($f_x, f_y, c_x, c_y$) are positive.

2. The Kubric/OpenGL Camera Convention

In Kubric (and other graphics applications like Blender/OpenGL), the camera coordinate system is different:

  • +X axis points to the right.
  • +Y axis points up.
  • +Z axis points backward (out of the screen, toward the camera).

This means the Y and Z axes are inverted compared to the OpenCV convention.

Let's look at the code:

https://github.com/google-research/kubric/blob/4d5a0d4ee80cac1c318f58bed83db284f6c70036/challenges/point_tracking/dataset.py#L248-L253

This corresponds to the matrix:

$$ \mathbf{K}_{kubric} = \begin{pmatrix} f_x & 0 & -c_x \ 0 & -f_y & -c_y \ 0 & 0 & -1 \end{pmatrix} $$

This directly explains what you observed:

  • K[:, 1] is negative (-f_y): This accounts for the flipped Y-axis (up vs. down).
  • K[:, 2] is negative (-c_x, -c_y, -1): This accounts for the flipped Z-axis (forward vs. backward depth).

3. How to Convert to the OpenCV Standard in Your Code

If you want to use a standard OpenCV camera model in your own pipeline, you must convert both the intrinsic and extrinsic (pose) matrices.

Modify the Intrinsics Matrix to the OpenCV Standard: As you suggested, change the intrinsics definition to:

intrinsics.append(
    tf.stack([
        tf.stack([f_x, 0., p_x]),
        tf.stack([0., f_y, p_y]),
        tf.stack([0., 0., 1.]),
    ])
)

Convert the Camera Pose (Extrinsics) from OpenGL to OpenCV: After creating the standard intrinsics, you must also convert the camera pose (matrix_world). You do this by applying a transformation matrix that flips the Y and Z axes of the pose.

Original version:

https://github.com/google-research/kubric/blob/4d5a0d4ee80cac1c318f58bed83db284f6c70036/challenges/point_tracking/dataset.py#L255-L268

Modification:

        position = cam_positions[frame_idx]
        quat = cam_quaternions[frame_idx]
        rotation_matrix = rotation_matrix_3d.from_quaternion(
            tf.concat([quat[1:], quat[0:1]], axis=0)
        )
        transformation = tf.concat(
            [rotation_matrix, position[:, tf.newaxis]],
            axis=1,
        )
        transformation = tf.concat(
            [transformation,
             tf.constant([0.0, 0.0, 0.0, 1.0])[tf.newaxis, :]],
            axis=0,
        )
        
        # ADD THIS: Convert the camera pose from OpenGL-style to OpenCV-style
        cv_from_gl_transform = tf.constant([
            [1,  0,  0, 0],
            [0, -1,  0, 0],
            [0,  0, -1, 0],
            [0,  0,  0, 1]
        ], dtype=tf.float32)
        
        transformation = tf.matmul(transformation, cv_from_gl_transform)
        
        matrix_world.append(transformation)

By applying both of these changes, your entire pipeline will correctly operate under the standard OpenCV camera model.

Hope this clears things up for you!

Best,

YuxueYang1204 avatar Jul 27 '25 09:07 YuxueYang1204