com.unity.perception icon indicating copy to clipboard operation
com.unity.perception copied to clipboard

Camera Intrinsics & Plotting

Open jamesheatonrdm opened this issue 2 years ago • 5 comments

Hi,

I have seen this issue come up in a couple of places, but no one seems to have been able to answer it so I am hoping that laying out the issue here will get a response.

I am using Unity Perception to generate data for 3D bounding boxes

In the JSON file output, the camera intrinsic matrix is as follows:

"camera_intrinsic": [
          [
            2.77777767,
            0.0,
            0.0
          ],
          [
            0.0,
            4.16666651,
            0.0
          ],
          [
            0.0,
            0.0,
            -1.00002
          ]

Which is a 3x3 matrix. This is unlike any intrinsic matrix I have seen before, as it does not seem to include the focal length or optical center, and the last value is negative.

I have looked into datsetinsights and understand that this matrix is used to project the box corners and create the 3D bounding box on the image, using this intrinsic matrix and the plotting code I do not have any issues getting the correct projection.

The issue arises when I want to use a 'regular' intrinsic matrix of the form:

[fx, cx, 0]
[0, fy, cy]
[0, 0, 1]

The projected boxes appears different (much larger, well outside the 2D box bounds) when I use this matrix and project the 3D points accordingly.

Below is the code for the two projections, I have added a fourth column to the 'regular' intrinsic matrix to create a 3x4 matrix which is required for the projection.

Firstly, using the 3x4 camera matrix containing focal length and optical center of Unity camera, this is calculated as follows:

float3x3 GetIntrinsic(Camera cam)
    {
        float pixel_aspect_ratio = (float)cam.pixelWidth / (float)cam.pixelHeight;

        float alpha_u = cam.focalLength * ((float)cam.pixelWidth / cam.sensorSize.x);
        float alpha_v = cam.focalLength * pixel_aspect_ratio * ((float)cam.pixelHeight / cam.sensorSize.y);

        float u_0 = (float)cam.pixelWidth / 2;
        float v_0 = (float)cam.pixelHeight / 2;

        //IntrinsicMatrix in row major
        float3x3 camIntriMatrix = new float3x3(new float3(alpha_u, 0f, u_0),
                                            new float3(0f, alpha_v, v_0),
                                            new float3(0f, 0f, 1f));
        return camIntriMatrix;
    }

The resulting matrix is for a 650x400 image size

cam_to_img = np.array([[902.77777099609375, 0, 325, 0],
              [0, 1354.1666259765625, 200, 0],
              [0, 0, 1, 0]])

def project_3d_point(pt, cam_to_img):
    
    point = np.array(pt)
    point = np.append(point, 1)

    cam_to_img.dot(point)
    point = point[:2]/point[2]
    point = point.astype(np.int64)

    return point

Secondly, using the 3x3 matrix from the default perception camera output, and following the same code as datasetinsights, using a 650x400 image size:

cam_to_img = np.array([[2.77777767, 0, 0],
                           [0, 4.16666651, 0],
                          [0, 0, -1.00002]])

def project_3d_point(pt, cam_to_img):
    
    point = np.array(pt)

    point = cam_to_img.dot(point)
    point = point[:2]/point[2]
    point = np.array(
        [
            int(-(point[0] * 650) / 2.0 + (650 * 0.5)),
            int((point[1] * 400) / 2.0 + (400 * 0.5)),
        ]
    )

    return point

Whilst the output with the Unity intrinsics are correct, I cannot test this on a real camera as I have no idea how this is calculated, on a real camera I have access to focal length, optical center, and distortion parameters. Would someone be able to explain how this Unity camera_intrinsic is calculated and how using the 'regular' camera matrix the outputs are different?

Thanks in advance

jamesheatonrdm avatar Dec 06 '22 15:12 jamesheatonrdm

Hi @jamesheatonrdm

I meet the same problems. Have you solved it?

ChongjianGE avatar Jun 16 '23 06:06 ChongjianGE

Hi @ChongjianGE sort of, I did get the correct K matrix using solution found here https://stackoverflow.com/questions/39992968/how-to-calculate-field-of-view-of-the-camera-from-camera-intrinsic-matrix

And rearranging equation to get fx and fy

I checked against K matrix found here https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-1/ (scroll down about halfway through) and I believe they are doing the same

jamesheatonrdm avatar Jun 16 '23 09:06 jamesheatonrdm

Hi @jamesheatonrdm Thanks for the prompt reply. I wonder have you ever tried transfer the 3D world coordinates into 2D pixel coordinates by using the calculated intrisic matrix (e.g., [[902.77777099609375, 0, 325, 0],[0, 1354.1666259765625, 200, 0],[0, 0, 1, 0]]) and the world2camera matrix? On my side, it seems that PI != Intrinsic * World2Camera * PW (PI is the pixel coord, and PW is the wolrd coord).

ChongjianGE avatar Jun 16 '23 09:06 ChongjianGE

No, I don't think I could get it to work with that matrix you have described, but calculating it in the way in my original response to you seemed to do the trick for me

jamesheatonrdm avatar Jun 16 '23 09:06 jamesheatonrdm

Yes, totally understand. Thanks for the reply.

ChongjianGE avatar Jun 16 '23 09:06 ChongjianGE