EgoNet icon indicating copy to clipboard operation
EgoNet copied to clipboard

Relation between kpts_3d_pred and pose

Open nviolante25 opened this issue 3 years ago • 2 comments

Hello, Thank you for open-sourcing this amazing project!

I have a question about the convention for the transformation of the 3D box. EgoNet only produces an egocentric pose (i.e. camera coordinates) corresponding to the rotation between the 3D box extracted from the keypoints and a template 3D box. We also have a translation corresponding to the first point in kpts_3D_pred, here.

To better understand the coordinate systems involved I'm doing the following experiment:

  1. Create a template 3D bounding box following this, in the canonical pose.
  2. Rotate it with the rotation matrix given by EgoNet, this one

After doing these two steps, I still need one translation to place the 3D box in space (in the camera system). The question is, what translation should I use? Is it the one corresponding to the first point in kpts_3d_pred?

Thank you for your time

nviolante25 avatar Jun 22 '22 07:06 nviolante25

Hello, Thank you for open-sourcing this amazing project!

I have a question about the convention for the transformation of the 3D box. EgoNet only produces an egocentric pose (i.e. camera coordinates) corresponding to the rotation between the 3D box extracted from the keypoints and a template 3D box. We also have a translation corresponding to the first point in kpts_3D_pred, here.

To better understand the coordinate systems involved I'm doing the following experiment:

  1. Create a template 3D bounding box following this, in the canonical pose.
  2. Rotate it with the rotation matrix given by EgoNet, this one

After doing these two steps, I still need one translation to place the 3D box in space (in the camera system). The question is, what translation should I use? Is it the one corresponding to the first point in kpts_3d_pred?

Thank you for your time

Hi, by default the translation of the input 3D box is used for visualization https://github.com/Nicholasli1995/EgoNet/blob/13e3758388ab9f8a8c59774b878d6990f9e94042/libs/visualization/egonet_utils.py#L70. When ground truth boxes are specified, the ground truth translation is used instead.

You can also play with other translation estimation paradigms. For example, use cv2.solvePnP to solve the translation with the predicted 2D keypoints from EgoNet https://github.com/Nicholasli1995/EgoNet/blob/13e3758388ab9f8a8c59774b878d6990f9e94042/tools/inference_legacy.py#L501.

Nicholasli1995 avatar Jun 22 '22 13:06 Nicholasli1995

Great, thanks for the answer!

nviolante25 avatar Jun 23 '22 15:06 nviolante25