ai2thor icon indicating copy to clipboard operation
ai2thor copied to clipboard

How can I get the intrinsic matrix of the agent camera

Open xiaobaishu0097 opened this issue 3 years ago • 10 comments

Hi,

This is probably a very amateur question but is it possible to get the intrinsic matrix of the agent camera OR can I obtain point cloud from RGB-D image captured from ai2thor? I tried https://github.com/allenai/ai2thor/issues/110#issue-412514905, but it seems to have some compiler errors.

Thanks

xiaobaishu0097 avatar Apr 29 '21 13:04 xiaobaishu0097

@xiaobaishu0097 Hi,

I am meeting the same problem as you. I wonder if you find the way to address it. Really appreciate your help.

Thanks

ruinianxu avatar Aug 03 '21 23:08 ruinianxu

Just in case anyone runs into this issue, here is my two cents. I found that it is sufficient to compute the camera intrinsic using just the field of view, which is exposed by the ai2thor API and configurable.

I was originally following #275 and #110, and tried to print out the projection matrix in unity, but it led to nowhere. If you have a method to get camera intrinsic matrix directly from unity, please share.

Below is some code for plotting point cloud given a color and depth image, using Open3D. Ai2thor version is 3.3.4.


import math
import numpy as np
import open3d as o3d
from ai2thor.controller import Controller

def to_rad(th):
    return th*math.pi / 180

def example():
    width, height = 600, 600
    fov = 90
    controller = Controller(scene="FloorPlan1",
                            width=width,
                            height=height,
                            fieldOfView=fov,
                            renderDepthImage=True)
    controller.step(action="RotateLeft", degrees=45)
    event = controller.step(action="Pass")

    # Convert fov to focal length
    focal_length = 0.5 * width * math.tan(to_rad(fov/2))

    # camera intrinsics
    fx, fy, cx, cy = (focal_length, focal_length, width/2, height/2)

    # Obtain point cloud
    color = o3d.geometry.Image(event.frame.astype(np.uint8))
    d = event.depth_frame
    d /= np.max(d)
    depth = o3d.geometry.Image(d)
    rgbd = o3d.geometry.RGBDImage.create_from_color_and_depth(color, depth,
                                                              depth_scale=1.0,
                                                              depth_trunc=0.7,
                                                              convert_rgb_to_intensity=False)
    intrinsic = o3d.camera.PinholeCameraIntrinsic(width, height, fx, fy, cx, cy)
    pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd, intrinsic)
    pcd.transform([[1, 0, 0, 0],
                   [0, -1, 0, 0],
                   [0, 0, -1, 0],
                   [0, 0, 0, 1]])
    o3d.visualization.draw_geometries([pcd])

if __name__ == "__main__":
    example()

Output looks like: Selection_025

I haven't tested, but it is promising, based on the quality of this point cloud, you can obtain 3D coordinates in the world frame from 2D pixels.

zkytony avatar Aug 22 '21 23:08 zkytony

I successfully modified the source code and let the agent print projection matrix of the camera. I've documented everything in my forked version. Under the test folder, there is a readme file documents everything from installing unity in ubuntu to compling and building the source code. Hope it can help someone who need to get the projection matrix of camera.

ruinianxu avatar Aug 23 '21 03:08 ruinianxu

focal_length = 0.5 * width * math.tan(to_rad(fov/2))

Should it be

focal_length = 0.5 * width / math.tan(to_rad(fov/2))

GuardianWang avatar Apr 29 '22 03:04 GuardianWang

I agree with @GuardianWang. In this case it works because math.tan(to_rad(fov/2)) ~ 1 so multiplying or dividing gives the same result, but the right formula is with dividing

LucBourrat1 avatar Dec 09 '22 10:12 LucBourrat1

hi @ruinianxu, it seems your code should run based on a local unity hub. is that right?

ChongjianGE avatar Jun 10 '23 05:06 ChongjianGE

@ChongjianGE Yeah.

ruinianxu avatar Jun 10 '23 16:06 ruinianxu

@ruinianxu Hi, ruinian,

Thank you for your assistance. I have successfully implemented the retrieval of the projection matrix in Unity Editor version 2021.03. However, I have noticed that regardless of how I move the agent/camera, the 4x4 projection matrix I obtain remains unchanged. This seems abnormal to me based on my understanding. Additionally, I would like to ask if you have any recommended methods for conveniently verifying whether the projection matrix is retrieved correctly.

ChongjianGE avatar Jun 11 '23 01:06 ChongjianGE

i know event.metadata['agent'] can give me the position of the agent,but how can i get the relationship between the camera coordinate system and the agent coordinate system.

lvjihui avatar Jul 17 '23 08:07 lvjihui