com.unity.perception
com.unity.perception copied to clipboard
Wrong camera intrinsic parameters in captures.json
//Sample from PerceptionCamera.cs (Unity Perception package)
void CaptureRgbData(Camera cam)
{
if (!captureRgbImages)
return;
Profiler.BeginSample("CaptureDataFromLastFrame");
// Record the camera's projection matrix
SetPersistentSensorData("camera_intrinsic", ToProjectionMatrix3x3(cam.projectionMatrix));
...
As you can see from the sample above it takes Unity Camera Projection Matrix and records it as Intrinsic Parameters of Camera, however intrinsic camera parameters and camera projection matrix are calculated differently and serve a different purpose.
here is a brief explanation of camera parameters (intrinsic & extrinsic) by mathworks.com.
I wonder is it a proper way to describe Intrinsic parameters of sensor(camera) or is it a misuse of the word in the following line?
SetPersistentSensorData("camera_intrinsic", ToProjectionMatrix3x3(cam.projectionMatrix));
P.S.: I've noticed this issue while expecting to get intrinsic parameters of camera for my project
This a good call-out @Rawdreen. I'll poke some of my colleagues and we'll see what sort of revisions might be needed here. Thanks for the link too!
Hi @Rawdreen! If you have a moment, could you verify the following list of parameters for me to insure that we're on the same page in regards to what intrinsic metrics you would prefer to be available in the dataset?
Here's my current running list:
- focal length
- sensor size
- lens shift
- field of view
- near and far plane
In addition, renaming the current "camera_intrinsics" sensor field to "camera_projection_matrix" seems appropriate here.
Hi @sleal-unity! When it comes to intrinsic metrics you are correct. And I guess it would be great to output, not just intrinsic metrics but the intrinsic matrix of the camera.
@Rawdreen thanks for the input! I'll add this change to our backlog and see if we tack it onto our next major release.
In the meantime though, the following labeler will output the camera properties I listed above as an annotation for each captured frame. Just create a new csharp script in your project, copy paste this labeler code in, and add the new "CameraIntrinsicsLabeler" to your PerceptionCamera to get these properties to show up in your dataset.
using System;
using UnityEngine;
using UnityEngine.Perception.GroundTruth;
using UnityEngine.Rendering;
[Serializable]
public class CameraIntrinsicsLabeler : CameraLabeler
{
public struct CameraIntrinsicsSpec
{
public float focalLength;
public float fieldOfView;
public float nearClipPlane;
public float farClipPlane;
public Vector2 sensorSize;
public Vector2 lensShift;
}
Camera m_Camera;
AnnotationDefinition m_AnnotationDefinition;
public string annotationId = "94179c03-6258-4cfe-8449-f337fcd24301";
public override string description
{
get => "Outputs the camera sensor's intrinsic properties for each captured frame.";
protected set { }
}
protected override bool supportsVisualization => false;
protected override void Setup()
{
m_Camera = perceptionCamera.GetComponent<Camera>();
m_AnnotationDefinition = DatasetCapture.RegisterAnnotationDefinition(
"Camera intrinsics",
"Counts of objects for each label in the sensor's view",
id: new Guid(annotationId));
}
protected override void OnBeginRendering(ScriptableRenderContext scriptableRenderContext)
{
sensorHandle.ReportAnnotationValues(m_AnnotationDefinition, new [] { new CameraIntrinsicsSpec
{
focalLength = m_Camera.focalLength,
fieldOfView = m_Camera.fieldOfView,
nearClipPlane = m_Camera.nearClipPlane,
farClipPlane = m_Camera.farClipPlane,
sensorSize = m_Camera.sensorSize,
lensShift = m_Camera.lensShift
}});
}
}
@sleal-unity, for now, I overwrote the ToProjectionMatrix3x3() function with my custom CameraIntrinsicMatrix script.
Thanks for the help, going to test your solution out! 👍
This is a great feature request! I'm going to reopen it so that we can come back to this thread once it is implemented.
@JonathanHUnity great!
I implemented Camera Intrinsic Matrix in this function (maybe it'll help somebody):
float3x3 GetIntrinsic(Camera cam)
{
float pixel_aspect_ratio = (float)cam.pixelWidth / (float)cam.pixelHeight;
float alpha_u = cam.focalLength * ((float)cam.pixelWidth / cam.sensorSize.x);
float alpha_v = cam.focalLength * pixel_aspect_ratio * ((float)cam.pixelHeight / cam.sensorSize.y);
float u_0 = (float)cam.pixelWidth / 2;
float v_0 = (float)cam.pixelHeight / 2;
//IntrinsicMatrix in row major
float3x3 camIntriMatrix = new float3x3(new float3(alpha_u, 0f, u_0),
new float3(0f, alpha_v, v_0),
new float3(0f, 0f, 1f));
return camIntriMatrix;
}
hi @sleal-unity i tried adding your code snippet in the project but i get the following.
where should place the .cs file ?
Was this fixed in unity 0.10.0 ?
I cant find the CaptureRgbData
function anymore
@sleal-unity @Rawdreen @JonathanHUnity
Can anyone of you please confirm the following?
- Rotation and Translation are the [R | t] for the camera
- "camera_intrinsic" is the 3x3 Projection Matrix to convert world coordinates to image coordinates?
- The camera intrinsics K, can be obtained using @sleal-unity 's script above
Any update on the fix for this? I think it would be better if the intrinsic matrix is provided according to @Rawdreen
Hi, I am also having some issues with the camera matrix.
I am looking at Unity Perception to generate data for ML models and then evaluate them on real data, specifically 3D bounding box estimators.
I have seen that in the datasetinsights package there is code that will project a 3D point into camera space using the 3x3 camera_intrinsic matrix. This works well when using the 3x3 matrix provided as default however the output changes somewhat when I attempt to use the different 3x3 matrix as provided by @Rawdreen.
I was wondering if there is a direct correspondance between the values of the two matrices. The documentation does not say what each value in the camera_intrinsic matrix represents nor explains how these values are obtained, it is very different to a typical camera intrinsic matrix.
Let me show you the issue I am facing.
I am using Unity to generate data for pose estimation. I am not using the full datasetinsights code to draw the 3D bounding box as I am using the correspondance with the 2D bounding box to estimate the position of the object in 3D space. This is based on code found in this repository: https://github.com/skhadem/3D-BoundingBox
When I use the point projection code using the 'usual' camera matrix I get the following result:
This is projecting the points as follows:
def project_3d_pt(pt, cam_to_img):
point = np.array(pt)
point = cam_to_img.dot(point)
if point[2] != 0:
point /= point[2]
point = point.astype(np.int16)
return point
Where cam_to_img is the 3x3 matrix obtained through @Rawdreen code provided above:
cam_to_img = np.array([[902.77777099609375, 0, 325],
[0, 1354.1666259765625, 200],
[0, 0, 1]])
However, when I use the projection code found in datasetinsights package with the 'camera_intrinsic' matrix, I get the following image.
Note the bounding box is much more accurate. The 3d points have been estimated in the same way and passed into the changed project_3d_point as follows:
def project_3d_pt(pt, cam_to_img):
point = np.array(pt)
point = cam_to_img.dot(point)
if point[2] != 0:
point /= point[2]
point = np.array(
[
int(-(point[0] * 650) / 2.0 + (650 * 0.5)),
int((point[1] * 400) / 2.0 + (400 * 0.5)),
]
)
return point
Where 650, 400 are the width and height of the image.
With cam_to_img being the camera_instrinsic obtained from the default output of the Perception package:
cam_to_img = np.array([[2.77777767, 0, 0],
[0, 4.16666651, 0],
[0, 0, -1.00002]])
I was wondering why in the datasetinsights code the output is scaled according to the image size. It is something to do with the camera intrinsic matrix but I am confused as to what the values in this matrix represent, especially as I have not seen a -1 in the last row of an intrinsic matrix before. Am I missing something?
Could someone explain the difference between the two intrinsic values and why they produce different outputs when the 3D point estimation is the same?
float3x3 GetIntrinsic(Camera cam) { float pixel_aspect_ratio = (float)cam.pixelWidth / (float)cam.pixelHeight; float alpha_u = cam.focalLength * ((float)cam.pixelWidth / cam.sensorSize.x); float alpha_v = cam.focalLength * pixel_aspect_ratio * ((float)cam.pixelHeight / cam.sensorSize.y); float u_0 = (float)cam.pixelWidth / 2; float v_0 = (float)cam.pixelHeight / 2; //IntrinsicMatrix in row major float3x3 camIntriMatrix = new float3x3(new float3(alpha_u, 0f, u_0), new float3(0f, alpha_v, v_0), new float3(0f, 0f, 1f)); return camIntriMatrix; }
Thanks for this! Minor correction as the matrix is the wrong way around:
float3x3 GetIntrinsic(Camera cam)
{
float pixel_aspect_ratio = (float)cam.pixelWidth / (float)cam.pixelHeight;
float alpha_u = cam.focalLength * ((float)cam.pixelWidth / cam.sensorSize.x);
float alpha_v = cam.focalLength * pixel_aspect_ratio * ((float)cam.pixelHeight / cam.sensorSize.y);
float u_0 = (float)cam.pixelWidth / 2;
float v_0 = (float)cam.pixelHeight / 2;
//IntrinsicMatrix in row major
float3x3 camIntriMatrix = new float3x3(alpha_u, 0f, u_0,
0f, alpha_v, v_0,
0f, 0f, 1f);
return camIntriMatrix;
}
Hi @eugeneteoh ,
Have you successfully verify the intrisic matrix by transfering the 3D world coordinates to 2D coordinates?