kubric Negative values in the intrinsics matrix generated in the MOVI datasets

Hello,

I've been playing around with the Movi dataset, and I found something odd about the intrinsics matrix:

K = [[1.09375, 0.0, -0.5],
        [0.0, -1.09375, -0.5],
        [0.0, 0.0, -1.0]]

AFAIK the intrinsics matrix should have the form of

K = [[fx, 0.0, -cx],
        [0.0, fy, -cy],
        [0.0, 0.0, 1.0]]

where fx, fy, cx, cy > 0. What does it mean when the intrinsics matrix has negative values i.e. entries K_11 = -1.09375 and K_22 = -1.0?

Thank you in advance!

Aug 07 '24 00:08 andrewsonga

@andrewsonga Hi, I encountered the same issue when projecting the world coordinate system into the camera coordinate system. Have you found a solution yet?

Sep 21 '24 09:09 zhangzjjjjjj

Hi, @andrewsonga @zhangzjjjjjj

While I'm not one of the official authors, I've looked into this and can explain it.

The short answer is that the negative values in the intrinsics matrix are a result of Kubric using a camera coordinate system as OpenGL, rather than the one typically used in OpenCV.

Here is a more detailed breakdown:

1. The OpenCV Camera Convention (The Baseline)

In the standard OpenCV convention, the camera coordinate system is defined as:

+X axis points to the right.
+Y axis points down.
+Z axis points forward (into the scene).

The projection equation from camera coordinates $(X, Y, Z)$ to pixel coordinates $(u, v)$ is:

$$ Z\begin{pmatrix} u \ v \ 1 \end{pmatrix} = \mathbf{K} \begin{pmatrix} X \ Y \ Z \end{pmatrix} = \begin{pmatrix} f_x & 0 & c_x \ 0 & f_y & c_y \ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} X \ Y \ Z \end{pmatrix} $$

Here, all parameters ($f_x, f_y, c_x, c_y$) are positive.

2. The Kubric/OpenGL Camera Convention

In Kubric (and other graphics applications like Blender/OpenGL), the camera coordinate system is different:

+X axis points to the right.
+Y axis points up.
+Z axis points backward (out of the screen, toward the camera).

This means the Y and Z axes are inverted compared to the OpenCV convention.

Let's look at the code:

https://github.com/google-research/kubric/blob/4d5a0d4ee80cac1c318f58bed83db284f6c70036/challenges/point_tracking/dataset.py#L248-L253

This corresponds to the matrix:

$$ \mathbf{K}_{kubric} = \begin{pmatrix} f_x & 0 & -c_x \ 0 & -f_y & -c_y \ 0 & 0 & -1 \end{pmatrix} $$

This directly explains what you observed:

K[:, 1] is negative (-f_y): This accounts for the flipped Y-axis (up vs. down).
K[:, 2] is negative (-c_x, -c_y, -1): This accounts for the flipped Z-axis (forward vs. backward depth).

3. How to Convert to the OpenCV Standard in Your Code

If you want to use a standard OpenCV camera model in your own pipeline, you must convert both the intrinsic and extrinsic (pose) matrices.

Modify the Intrinsics Matrix to the OpenCV Standard: As you suggested, change the intrinsics definition to:

intrinsics.append(
    tf.stack([
        tf.stack([f_x, 0., p_x]),
        tf.stack([0., f_y, p_y]),
        tf.stack([0., 0., 1.]),
    ])
)

Convert the Camera Pose (Extrinsics) from OpenGL to OpenCV: After creating the standard intrinsics, you must also convert the camera pose (matrix_world). You do this by applying a transformation matrix that flips the Y and Z axes of the pose.

Original version:

https://github.com/google-research/kubric/blob/4d5a0d4ee80cac1c318f58bed83db284f6c70036/challenges/point_tracking/dataset.py#L255-L268

Modification:

        position = cam_positions[frame_idx]
        quat = cam_quaternions[frame_idx]
        rotation_matrix = rotation_matrix_3d.from_quaternion(
            tf.concat([quat[1:], quat[0:1]], axis=0)
        )
        transformation = tf.concat(
            [rotation_matrix, position[:, tf.newaxis]],
            axis=1,
        )
        transformation = tf.concat(
            [transformation,
             tf.constant([0.0, 0.0, 0.0, 1.0])[tf.newaxis, :]],
            axis=0,
        )
        
        # ADD THIS: Convert the camera pose from OpenGL-style to OpenCV-style
        cv_from_gl_transform = tf.constant([
            [1,  0,  0, 0],
            [0, -1,  0, 0],
            [0,  0, -1, 0],
            [0,  0,  0, 1]
        ], dtype=tf.float32)
        
        transformation = tf.matmul(transformation, cv_from_gl_transform)
        
        matrix_world.append(transformation)

By applying both of these changes, your entire pipeline will correctly operate under the standard OpenCV camera model.

Hope this clears things up for you!

Best,

Jul 27 '25 09:07 YuxueYang1204