EgoNet Relation between kpts_3d

Hello, Thank you for open-sourcing this amazing project!

I have a question about the convention for the transformation of the 3D box. EgoNet only produces an egocentric pose (i.e. camera coordinates) corresponding to the rotation between the 3D box extracted from the keypoints and a template 3D box. We also have a translation corresponding to the first point in kpts_3D_pred, here.

To better understand the coordinate systems involved I'm doing the following experiment:

Create a template 3D bounding box following this, in the canonical pose.
Rotate it with the rotation matrix given by EgoNet, this one

After doing these two steps, I still need one translation to place the 3D box in space (in the camera system). The question is, what translation should I use? Is it the one corresponding to the first point in kpts_3d_pred?

Thank you for your time

Jun 22 '22 07:06 nviolante25

Hello, Thank you for open-sourcing this amazing project!

I have a question about the convention for the transformation of the 3D box. EgoNet only produces an egocentric pose (i.e. camera coordinates) corresponding to the rotation between the 3D box extracted from the keypoints and a template 3D box. We also have a translation corresponding to the first point in kpts_3D_pred, here.

To better understand the coordinate systems involved I'm doing the following experiment:

Create a template 3D bounding box following this, in the canonical pose.

Rotate it with the rotation matrix given by EgoNet, this one

After doing these two steps, I still need one translation to place the 3D box in space (in the camera system). The question is, what translation should I use? Is it the one corresponding to the first point in kpts_3d_pred?

Thank you for your time

Hi, by default the translation of the input 3D box is used for visualization https://github.com/Nicholasli1995/EgoNet/blob/13e3758388ab9f8a8c59774b878d6990f9e94042/libs/visualization/egonet_utils.py#L70. When ground truth boxes are specified, the ground truth translation is used instead.

You can also play with other translation estimation paradigms. For example, use cv2.solvePnP to solve the translation with the predicted 2D keypoints from EgoNet https://github.com/Nicholasli1995/EgoNet/blob/13e3758388ab9f8a8c59774b878d6990f9e94042/tools/inference_legacy.py#L501.

Jun 22 '22 13:06 Nicholasli1995

Great, thanks for the answer!

Jun 23 '22 15:06 nviolante25

Relation between kpts_3d_pred and pose