ai2thor
ai2thor copied to clipboard
How can I get the intrinsic matrix of the agent camera
Hi,
This is probably a very amateur question but is it possible to get the intrinsic matrix of the agent camera OR can I obtain point cloud from RGB-D image captured from ai2thor? I tried https://github.com/allenai/ai2thor/issues/110#issue-412514905, but it seems to have some compiler errors.
Thanks
@xiaobaishu0097 Hi,
I am meeting the same problem as you. I wonder if you find the way to address it. Really appreciate your help.
Thanks
Just in case anyone runs into this issue, here is my two cents. I found that it is sufficient to compute the camera intrinsic using just the field of view, which is exposed by the ai2thor API and configurable.
I was originally following #275 and #110, and tried to print out the projection matrix in unity, but it led to nowhere. If you have a method to get camera intrinsic matrix directly from unity, please share.
Below is some code for plotting point cloud given a color and depth image, using Open3D. Ai2thor version is 3.3.4.
import math
import numpy as np
import open3d as o3d
from ai2thor.controller import Controller
def to_rad(th):
return th*math.pi / 180
def example():
width, height = 600, 600
fov = 90
controller = Controller(scene="FloorPlan1",
width=width,
height=height,
fieldOfView=fov,
renderDepthImage=True)
controller.step(action="RotateLeft", degrees=45)
event = controller.step(action="Pass")
# Convert fov to focal length
focal_length = 0.5 * width * math.tan(to_rad(fov/2))
# camera intrinsics
fx, fy, cx, cy = (focal_length, focal_length, width/2, height/2)
# Obtain point cloud
color = o3d.geometry.Image(event.frame.astype(np.uint8))
d = event.depth_frame
d /= np.max(d)
depth = o3d.geometry.Image(d)
rgbd = o3d.geometry.RGBDImage.create_from_color_and_depth(color, depth,
depth_scale=1.0,
depth_trunc=0.7,
convert_rgb_to_intensity=False)
intrinsic = o3d.camera.PinholeCameraIntrinsic(width, height, fx, fy, cx, cy)
pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd, intrinsic)
pcd.transform([[1, 0, 0, 0],
[0, -1, 0, 0],
[0, 0, -1, 0],
[0, 0, 0, 1]])
o3d.visualization.draw_geometries([pcd])
if __name__ == "__main__":
example()
Output looks like:
I haven't tested, but it is promising, based on the quality of this point cloud, you can obtain 3D coordinates in the world frame from 2D pixels.
I successfully modified the source code and let the agent print projection matrix of the camera. I've documented everything in my forked version. Under the test folder, there is a readme file documents everything from installing unity in ubuntu to compling and building the source code. Hope it can help someone who need to get the projection matrix of camera.
focal_length = 0.5 * width * math.tan(to_rad(fov/2))
Should it be
focal_length = 0.5 * width / math.tan(to_rad(fov/2))
I agree with @GuardianWang. In this case it works because math.tan(to_rad(fov/2)) ~ 1 so multiplying or dividing gives the same result, but the right formula is with dividing
hi @ruinianxu, it seems your code should run based on a local unity hub. is that right?
@ChongjianGE Yeah.
@ruinianxu Hi, ruinian,
Thank you for your assistance. I have successfully implemented the retrieval of the projection matrix in Unity Editor version 2021.03. However, I have noticed that regardless of how I move the agent/camera, the 4x4 projection matrix I obtain remains unchanged. This seems abnormal to me based on my understanding. Additionally, I would like to ask if you have any recommended methods for conveniently verifying whether the projection matrix is retrieved correctly.
i know event.metadata['agent'] can give me the position of the agent,but how can i get the relationship between the camera coordinate system and the agent coordinate system.