hope-dataset Object pose of HOPE-Video

Hi @swtyree @Uio96 @sbirchfield,

Thanks for sharing the HOPE cad models and dataset!

My question is, when I try to project object pose back to the world frame from different scenes, I found that their pose in world frame are not the same, which means the pose has some errors. So is this error acceptable?

Thanks.

Sep 17 '21 14:09 Jing-lun

Hi @Jing-lun, thanks for spotting this. I think the issue is an error in camera extrinsics for HOPE-Video where the translation units appear to be in meters, while object poses are in cm. I'll confirm this and update the files later today.

Sep 17 '21 18:09 swtyree

Hi @swtyree, thanks for your prompt reply and let me know!

I tested again and even though I make the units consistent, the 3D pose still cannot be matched.

I tested the pose of Mac&Cheese model in the first and the last view in scene_0000, and below is my calculation.

'''camera_extrinsic1 and pose1 are from hope_video/scene_0000/0000.json'''
camera_extrinsic1 = np.asarray([[
                -0.9886373,
                -0.14978693,
                0.012654976,
                79.977846
            ],[
                -0.1278811,
                0.7938205,
                -0.59455484,
                -32.258067
            ],[
                0.07901077,
                -0.5894174,
                -0.8039555,
                23.390512
            ],[
                0.0,
                0.0,
                0.0,
                1.0
            ]])
pose1 =  np.asarray([[
            -0.20787001630983457,
            -0.9763291480689646,
            0.059761308091495956,
            0.13647988469971395
        ],[
            -0.7948878577485222,
            0.13300297015968482,
            -0.5919996280969145,
            -21.892505447000136
        ],[
            0.5700380023040957,
            -0.17056251735382824,
            -0.8037195436940463,
            55.94770750870245
        ],[
            0.0,
            0.0,
            0.0,
            1.0
        ]])

'''camera_extrinsic2 and pose2 are from hope_video/scene_0000/0364.json'''
camera_extrinsic2 = np.asarray([[
                -0.8754454,
                -0.45876563,
                -0.15208338,
                58.14929
            ],[
                -0.2247508,
                0.6649921,
                -0.7122307,
                -19.330604
            ],[
                0.4278812,
                -0.58933824,
                -0.6852723,
                6.31446
            ],[
                0.0,
                0.0,
                0.0,
                1.0
            ]])
pose2 = np.asarray([[
            0.11922886974591602,
            -0.9869201213152595,
            -0.10850370274050081,
            -7.430878036322042
        ],[
            -0.709816306693385,
            -0.008316097101108606,
            -0.7043377604400789,
            -12.936461380072812
        ],[
            0.6942227429692126,
            0.1609950503421291,
            -0.7015235871501129,
            62.88667563186634
        ],[
            0.0,
            0.0,
            0.0,
            1.0
        ]])

'''Tow = Toc*Tcw'''
world1 = pose1.dot(camera_extrinsic1)
world2 = pose2.dot(camera_extrinsic2)

Sep 17 '21 19:09 Jing-lun

Okay, thanks for the update. Have you confirmed that this issue is only with HOPE-Video and not HOPE-Image?

Sep 17 '21 19:09 swtyree

Okay, thanks for the update. Have you confirmed that this issue is only with HOPE-Video and not HOPE-Image?

Well, all the objects in the HOPE-Image folder stay still and have no translation and rotation (I think the only difference in HOPE-Image is the lighting condition), so I cannot use the same way to check if the object pose in the world frame is the same or not.

Sep 17 '21 19:09 Jing-lun

I think I figured out the issues:

As we already established, the translation in the camera extrinsic matrix was in m, while object poses are in cm.
The extrinsic matrix is actually world-to-camera, rather than camera-to-world as you expected (and as I also expected until I dug into it). In the line preview.py#L112, the extrinsic matrix is used to transform the scene reconstruction point cloud from world coordinates to camera coordinates.

To project a pose from camera to world coordinates, use this for now:

extrinsics_w2c[:3,-1] *= 100  # correct translation units from m to cm
pose_world = np.linalg.inv(extrinsics_w2c) @ pose_camera

I'll update the documentation in the README, and I may upload a new version with more explicit key names in the json files. But I'll need to do that at a later time.

Thanks again for reaching out with the issue!

Sep 18 '21 02:09 swtyree

Thanks a lot @swtyree! Now the poses are matched!

Sep 18 '21 02:09 Jing-lun

Thanks! I'm going to reopen the issue until I can get a new version of the annotations uploaded to Google Drive.

Sep 18 '21 02:09 swtyree

hope-dataset hope-dataset copied to clipboard

Object pose of HOPE-Video

hope-dataset
hope-dataset copied to clipboard