hope-dataset icon indicating copy to clipboard operation
hope-dataset copied to clipboard

Object pose of HOPE-Video

Open Jing-lun opened this issue 3 years ago • 7 comments

Hi @swtyree @Uio96 @sbirchfield,

Thanks for sharing the HOPE cad models and dataset!

My question is, when I try to project object pose back to the world frame from different scenes, I found that their pose in world frame are not the same, which means the pose has some errors. So is this error acceptable?

Thanks.

Jing-lun avatar Sep 17 '21 14:09 Jing-lun

Hi @Jing-lun, thanks for spotting this. I think the issue is an error in camera extrinsics for HOPE-Video where the translation units appear to be in meters, while object poses are in cm. I'll confirm this and update the files later today.

swtyree avatar Sep 17 '21 18:09 swtyree

Hi @swtyree, thanks for your prompt reply and let me know!

I tested again and even though I make the units consistent, the 3D pose still cannot be matched.

I tested the pose of Mac&Cheese model in the first and the last view in scene_0000, and below is my calculation.

'''camera_extrinsic1 and pose1 are from hope_video/scene_0000/0000.json'''
camera_extrinsic1 = np.asarray([[
                -0.9886373,
                -0.14978693,
                0.012654976,
                79.977846
            ],[
                -0.1278811,
                0.7938205,
                -0.59455484,
                -32.258067
            ],[
                0.07901077,
                -0.5894174,
                -0.8039555,
                23.390512
            ],[
                0.0,
                0.0,
                0.0,
                1.0
            ]])
pose1 =  np.asarray([[
            -0.20787001630983457,
            -0.9763291480689646,
            0.059761308091495956,
            0.13647988469971395
        ],[
            -0.7948878577485222,
            0.13300297015968482,
            -0.5919996280969145,
            -21.892505447000136
        ],[
            0.5700380023040957,
            -0.17056251735382824,
            -0.8037195436940463,
            55.94770750870245
        ],[
            0.0,
            0.0,
            0.0,
            1.0
        ]])

'''camera_extrinsic2 and pose2 are from hope_video/scene_0000/0364.json'''
camera_extrinsic2 = np.asarray([[
                -0.8754454,
                -0.45876563,
                -0.15208338,
                58.14929
            ],[
                -0.2247508,
                0.6649921,
                -0.7122307,
                -19.330604
            ],[
                0.4278812,
                -0.58933824,
                -0.6852723,
                6.31446
            ],[
                0.0,
                0.0,
                0.0,
                1.0
            ]])
pose2 = np.asarray([[
            0.11922886974591602,
            -0.9869201213152595,
            -0.10850370274050081,
            -7.430878036322042
        ],[
            -0.709816306693385,
            -0.008316097101108606,
            -0.7043377604400789,
            -12.936461380072812
        ],[
            0.6942227429692126,
            0.1609950503421291,
            -0.7015235871501129,
            62.88667563186634
        ],[
            0.0,
            0.0,
            0.0,
            1.0
        ]])

'''Tow = Toc*Tcw'''
world1 = pose1.dot(camera_extrinsic1)
world2 = pose2.dot(camera_extrinsic2)

Jing-lun avatar Sep 17 '21 19:09 Jing-lun

Okay, thanks for the update. Have you confirmed that this issue is only with HOPE-Video and not HOPE-Image?

swtyree avatar Sep 17 '21 19:09 swtyree

Okay, thanks for the update. Have you confirmed that this issue is only with HOPE-Video and not HOPE-Image?

Well, all the objects in the HOPE-Image folder stay still and have no translation and rotation (I think the only difference in HOPE-Image is the lighting condition), so I cannot use the same way to check if the object pose in the world frame is the same or not.

Jing-lun avatar Sep 17 '21 19:09 Jing-lun

I think I figured out the issues:

  1. As we already established, the translation in the camera extrinsic matrix was in m, while object poses are in cm.
  2. The extrinsic matrix is actually world-to-camera, rather than camera-to-world as you expected (and as I also expected until I dug into it). In the line preview.py#L112, the extrinsic matrix is used to transform the scene reconstruction point cloud from world coordinates to camera coordinates.

To project a pose from camera to world coordinates, use this for now:

extrinsics_w2c[:3,-1] *= 100  # correct translation units from m to cm
pose_world = np.linalg.inv(extrinsics_w2c) @ pose_camera

I'll update the documentation in the README, and I may upload a new version with more explicit key names in the json files. But I'll need to do that at a later time.

Thanks again for reaching out with the issue!

swtyree avatar Sep 18 '21 02:09 swtyree

Thanks a lot @swtyree! Now the poses are matched!

Jing-lun avatar Sep 18 '21 02:09 Jing-lun

Thanks! I'm going to reopen the issue until I can get a new version of the annotations uploaded to Google Drive.

swtyree avatar Sep 18 '21 02:09 swtyree