com.unity.perception Wrong camera intrinsic parameters in captures.json

//Sample from PerceptionCamera.cs (Unity Perception package)

        void CaptureRgbData(Camera cam)
        {
            if (!captureRgbImages)
                return;

            Profiler.BeginSample("CaptureDataFromLastFrame");

            // Record the camera's projection matrix
            SetPersistentSensorData("camera_intrinsic", ToProjectionMatrix3x3(cam.projectionMatrix));
            ...

As you can see from the sample above it takes Unity Camera Projection Matrix and records it as Intrinsic Parameters of Camera, however intrinsic camera parameters and camera projection matrix are calculated differently and serve a different purpose.

here is a brief explanation of camera parameters (intrinsic & extrinsic) by mathworks.com.

I wonder is it a proper way to describe Intrinsic parameters of sensor(camera) or is it a misuse of the word in the following line?

SetPersistentSensorData("camera_intrinsic", ToProjectionMatrix3x3(cam.projectionMatrix));

P.S.: I've noticed this issue while expecting to get intrinsic parameters of camera for my project

Nov 02 '21 11:11 Rawdreen

This a good call-out @Rawdreen. I'll poke some of my colleagues and we'll see what sort of revisions might be needed here. Thanks for the link too!

Nov 09 '21 18:11 sleal-unity

Hi @Rawdreen! If you have a moment, could you verify the following list of parameters for me to insure that we're on the same page in regards to what intrinsic metrics you would prefer to be available in the dataset?

Here's my current running list:

focal length
sensor size
lens shift
field of view
near and far plane

In addition, renaming the current "camera_intrinsics" sensor field to "camera_projection_matrix" seems appropriate here.

Nov 10 '21 20:11 sleal-unity

Hi @sleal-unity! When it comes to intrinsic metrics you are correct. And I guess it would be great to output, not just intrinsic metrics but the intrinsic matrix of the camera.

Nov 11 '21 05:11 Rawdreen

@Rawdreen thanks for the input! I'll add this change to our backlog and see if we tack it onto our next major release.

In the meantime though, the following labeler will output the camera properties I listed above as an annotation for each captured frame. Just create a new csharp script in your project, copy paste this labeler code in, and add the new "CameraIntrinsicsLabeler" to your PerceptionCamera to get these properties to show up in your dataset.

using System;
using UnityEngine;
using UnityEngine.Perception.GroundTruth;
using UnityEngine.Rendering;

[Serializable]
public class CameraIntrinsicsLabeler : CameraLabeler
{
    public struct CameraIntrinsicsSpec
    {
        public float focalLength;
        public float fieldOfView;
        public float nearClipPlane;
        public float farClipPlane;
        public Vector2 sensorSize;
        public Vector2 lensShift;
    }

    Camera m_Camera;
    AnnotationDefinition m_AnnotationDefinition;
    
    public string annotationId = "94179c03-6258-4cfe-8449-f337fcd24301";
    
    public override string description
    {
        get => "Outputs the camera sensor's intrinsic properties for each captured frame.";
        protected set { }
    }
    protected override bool supportsVisualization => false;
    
    protected override void Setup()
    {
        m_Camera = perceptionCamera.GetComponent<Camera>();
        
        m_AnnotationDefinition = DatasetCapture.RegisterAnnotationDefinition(
            "Camera intrinsics",
            "Counts of objects for each label in the sensor's view",
            id: new Guid(annotationId));
    }

    protected override void OnBeginRendering(ScriptableRenderContext scriptableRenderContext)
    {
        sensorHandle.ReportAnnotationValues(m_AnnotationDefinition, new [] { new CameraIntrinsicsSpec
        {
            focalLength = m_Camera.focalLength,
            fieldOfView = m_Camera.fieldOfView,
            nearClipPlane = m_Camera.nearClipPlane,
            farClipPlane = m_Camera.farClipPlane,
            sensorSize = m_Camera.sensorSize,
            lensShift = m_Camera.lensShift
        }});
    }
}

Nov 12 '21 22:11 sleal-unity

@sleal-unity, for now, I overwrote the ToProjectionMatrix3x3() function with my custom CameraIntrinsicMatrix script.

Thanks for the help, going to test your solution out! 👍

Nov 13 '21 10:11 Rawdreen

This is a great feature request! I'm going to reopen it so that we can come back to this thread once it is implemented.

Nov 15 '21 16:11 JonathanHUnity

@JonathanHUnity great!

I implemented Camera Intrinsic Matrix in this function (maybe it'll help somebody):

        float3x3 GetIntrinsic(Camera cam)
        {
            float pixel_aspect_ratio = (float)cam.pixelWidth / (float)cam.pixelHeight;

            float alpha_u = cam.focalLength * ((float)cam.pixelWidth / cam.sensorSize.x);
            float alpha_v = cam.focalLength * pixel_aspect_ratio * ((float)cam.pixelHeight / cam.sensorSize.y);

            float u_0 = (float)cam.pixelWidth / 2;
            float v_0 = (float)cam.pixelHeight / 2;

            //IntrinsicMatrix in row major
            float3x3 camIntriMatrix = new float3x3(new float3(alpha_u, 0f, u_0),
                                                new float3(0f, alpha_v, v_0),
                                                new float3(0f, 0f, 1f));
            return camIntriMatrix;
        }

Nov 17 '21 18:11 Rawdreen

hi @sleal-unity i tried adding your code snippet in the project but i get the following.

where should place the .cs file ?

Mar 28 '22 15:03 an99990

Was this fixed in unity 0.10.0 ? I cant find the CaptureRgbData function anymore

Mar 28 '22 19:03 an99990

@sleal-unity @Rawdreen @JonathanHUnity

Can anyone of you please confirm the following?

Rotation and Translation are the [R | t] for the camera
"camera_intrinsic" is the 3x3 Projection Matrix to convert world coordinates to image coordinates?
The camera intrinsics K, can be obtained using @sleal-unity 's script above

cam_params

Apr 21 '22 09:04 umairkhawaja

Any update on the fix for this? I think it would be better if the intrinsic matrix is provided according to @Rawdreen

Oct 06 '22 19:10 sarimmehdi

Hi, I am also having some issues with the camera matrix.

I am looking at Unity Perception to generate data for ML models and then evaluate them on real data, specifically 3D bounding box estimators.

I have seen that in the datasetinsights package there is code that will project a 3D point into camera space using the 3x3 camera_intrinsic matrix. This works well when using the 3x3 matrix provided as default however the output changes somewhat when I attempt to use the different 3x3 matrix as provided by @Rawdreen.

I was wondering if there is a direct correspondance between the values of the two matrices. The documentation does not say what each value in the camera_intrinsic matrix represents nor explains how these values are obtained, it is very different to a typical camera intrinsic matrix.

Oct 27 '22 11:10 jamesheatonrdm

Let me show you the issue I am facing.

I am using Unity to generate data for pose estimation. I am not using the full datasetinsights code to draw the 3D bounding box as I am using the correspondance with the 2D bounding box to estimate the position of the object in 3D space. This is based on code found in this repository: https://github.com/skhadem/3D-BoundingBox

When I use the point projection code using the 'usual' camera matrix I get the following result: 4x4 This is projecting the points as follows:

def project_3d_pt(pt, cam_to_img):
    point = np.array(pt)    
    point = cam_to_img.dot(point)

    if point[2] != 0:
        point /= point[2]

    point = point.astype(np.int16)
    return point

Where cam_to_img is the 3x3 matrix obtained through @Rawdreen code provided above:

cam_to_img = np.array([[902.77777099609375, 0, 325],
              [0, 1354.1666259765625, 200],
              [0, 0, 1]])

However, when I use the projection code found in datasetinsights package with the 'camera_intrinsic' matrix, I get the following image. 3x3 Note the bounding box is much more accurate. The 3d points have been estimated in the same way and passed into the changed project_3d_point as follows:

def project_3d_pt(pt, cam_to_img):
    point = np.array(pt)
    point = cam_to_img.dot(point)

    if point[2] != 0:
        point /= point[2]

    point = np.array(
        [
            int(-(point[0] * 650) / 2.0 + (650 * 0.5)),
            int((point[1] * 400) / 2.0 + (400 * 0.5)),
        ]
    )
    return point

Where 650, 400 are the width and height of the image.

With cam_to_img being the camera_instrinsic obtained from the default output of the Perception package:

cam_to_img = np.array([[2.77777767, 0, 0],
                           [0, 4.16666651, 0],
                           [0, 0, -1.00002]])

I was wondering why in the datasetinsights code the output is scaled according to the image size. It is something to do with the camera intrinsic matrix but I am confused as to what the values in this matrix represent, especially as I have not seen a -1 in the last row of an intrinsic matrix before. Am I missing something?

Could someone explain the difference between the two intrinsic values and why they produce different outputs when the 3D point estimation is the same?

Oct 27 '22 12:10 jamesheatonrdm

        float3x3 GetIntrinsic(Camera cam)
        {
            float pixel_aspect_ratio = (float)cam.pixelWidth / (float)cam.pixelHeight;

            float alpha_u = cam.focalLength * ((float)cam.pixelWidth / cam.sensorSize.x);
            float alpha_v = cam.focalLength * pixel_aspect_ratio * ((float)cam.pixelHeight / cam.sensorSize.y);

            float u_0 = (float)cam.pixelWidth / 2;
            float v_0 = (float)cam.pixelHeight / 2;

            //IntrinsicMatrix in row major
            float3x3 camIntriMatrix = new float3x3(new float3(alpha_u, 0f, u_0),
                                                new float3(0f, alpha_v, v_0),
                                                new float3(0f, 0f, 1f));
            return camIntriMatrix;
        }

Thanks for this! Minor correction as the matrix is the wrong way around:

float3x3 GetIntrinsic(Camera cam)
{
    float pixel_aspect_ratio = (float)cam.pixelWidth / (float)cam.pixelHeight;

    float alpha_u = cam.focalLength * ((float)cam.pixelWidth / cam.sensorSize.x);
    float alpha_v = cam.focalLength * pixel_aspect_ratio * ((float)cam.pixelHeight / cam.sensorSize.y);

    float u_0 = (float)cam.pixelWidth / 2;
    float v_0 = (float)cam.pixelHeight / 2;

    //IntrinsicMatrix in row major
    float3x3 camIntriMatrix = new float3x3(alpha_u, 0f, u_0,
                                                0f, alpha_v, v_0,
                                                0f, 0f, 1f);
    return camIntriMatrix;
}

Mar 16 '23 11:03 eugeneteoh

Hi @eugeneteoh ,

Have you successfully verify the intrisic matrix by transfering the 3D world coordinates to 2D coordinates?

Jun 16 '23 06:06 ChongjianGE

com.unity.perception com.unity.perception copied to clipboard

Wrong camera intrinsic parameters in captures.json

com.unity.perception
com.unity.perception copied to clipboard