ROMP To get cam_intrinsics and cam_extrinsics from .npz files

Hi Yu, thanks for your work and such an organized repo!

I'm now using ROMP to get SMPL poses and would like to visualize the meshes via a perspective camera. I usually use a similar way as https://github.com/chungyiweng/humannerf/issues/1 to convert the s, t_x, and t_y along with a human bbox to the pinhole camera parameters and it does work on VIBE output. However, it seems like I cannot easily get the parameters with the output from ROMP .npz outputs (I can get a rough bbox from pj2d_org ). I found that the scaling factor s is quite different between the VIBE and ROMP estimation for the same input image (~1.14 in VIBE and ~0.58 in ROMP). Could you please point out how I can quickly obtain (estimate) the camera intrinsic and extrinsic? Thanks!

Jul 07 '22 13:07 MoyGcc

Hi, @MoyGcc Thanks for you kind word! You can use this function to achieve this. https://github.com/Arthur151/ROMP/blob/91dac0172c4dc0685b97f96eda9a3a53c626da47/simple_romp/romp/utils.py#L331 It take the estimated 3D joints, the 2D joints pj2d_org, image size, and the focal length to estimate the corresponding 3D translation in the camera space defined by these intrinsice paramters. In BEV, we calculate the focal length like this: focal length: when FOV=60 deg, 443.4 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(30)) BEV takes the square 512 x 512 input, we assume the FOV = 60 degree

Jul 08 '22 08:07 Arthur151

Hi Yu, thanks for your work and such an organized repo!

I'm now using ROMP to get SMPL poses and would like to visualize the meshes via a perspective camera. I usually use a similar way as chungyiweng/humannerf#1 to convert the s, t_x, and t_y along with a human bbox to the pinhole camera parameters and it does work on VIBE output. However, it seems like I cannot easily get the parameters with the output from ROMP .npz outputs (I can get a rough bbox from pj2d_org ). I found that the scaling factor s is quite different between the VIBE and ROMP estimation for the same input image (~1.14 in VIBE and ~0.58 in ROMP). Could you please point out how I can quickly obtain (estimate) the camera intrinsic and extrinsic? Thanks!

Have you solved this problem? I met the same issue.

Jul 08 '22 09:07 hongsiyu

Hi Yu @Arthur151, Thanks so much for the quick reply and for pointing out the correct way to do this. In the end, I followed the way that you applied for evaluation on AGORA: https://github.com/Arthur151/ROMP/blob/91dac0172c4dc0685b97f96eda9a3a53c626da47/simple_romp/evaluation/eval_AGORA.py#L79 and now the projected smpl mesh can align well with the image. Though there is still a "slight" difference (below, the one with normal color is my projected result) in terms of the projection. I think it's okay. @hongsiyu, you could probably also refer to the evaluation on the AGORA dataset for doing this.

0000 00000000

Jul 08 '22 09:07 MoyGcc

That's clever. Glad to hear that.

Jul 08 '22 09:07 Arthur151

So the intrinsic is ([443.4, 1, 512//2], [1, 443.4, 512//2],[0, 0, 1]), and the extrinsics[:3, 3] = cam_trans , right? @MoyGcc

Jul 08 '22 10:07 Andyen512

Hi Yu @Arthur151, Thanks so much for the quick reply and for pointing out the correct way to do this. In the end, I followed the way that you applied for evaluation on AGORA:

https://github.com/Arthur151/ROMP/blob/91dac0172c4dc0685b97f96eda9a3a53c626da47/simple_romp/evaluation/eval_AGORA.py#L79

and now the projected smpl mesh can align well with the image. Though there is still a "slight" difference (below, the one with normal color is my projected result) in terms of the projection. I think it's okay. @hongsiyu, you could probably also refer to the evaluation on the AGORA dataset for doing this.

I followed the way you mentioned with my own video. But the progress image in humannerf seems not correct. Do you succeed in trainiing humannerf with AGORA dataset.

Jul 08 '22 10:07 hongsiyu

@Andyen512 No, The image size should be the original size on input image, not on the resize BEV's input map. It is fine to directly use the camera intrinsic in humannerf during calculating the 3D translation using estimate_translation

"cam_intrinsics": [
            [23043.9, 0.0,940.19],
            [0.0, 23043.9, 539.23],
            [0.0, 0.0, 1.0]

Jul 08 '22 11:07 Arthur151

@Andyen512 No, The image size should be the original size on input image, not on the resize BEV's input map. It is fine to directly use the camera intrinsic in humannerf during calculating the 3D translation using estimate_translation
"cam_intrinsics": [
            [23043.9, 0.0,940.19],
            [0.0, 23043.9, 539.23],
            [0.0, 0.0, 1.0]

Thank you very much, the focal length makes me succeed in training humannerf.

Jul 08 '22 11:07 hongsiyu

@Arthur151 Sorry, why using the humannerf cam_intrinsics? I was using romp --mode=video --calc_smpl --render_mesh -i=/path/to/video.mp4 -o=/path/to/output/folder/results.mp4 --save_video to inference my own video and I see the args.focal_length in https://github.com/Arthur151/ROMP/blob/91dac0172c4dc0685b97f96eda9a3a53c626da47/romp/lib/config.py#L60 is 443.4. Also, the original size of input image is 1920*1080, so why not the cam_intrinsics[0][2]=960, cam_intrinsics[1][2]=540? I was so confused.

Jul 08 '22 15:07 Andyen512

@Andyen512 That focal length (23043.9) / image center coords (940.19, 539.23) is just for training humannerf in their camera extrinsic matrix.

To inference on you own video, you can re-calculate the focal length : when FOV=60 deg, focal length = H/2 * 1/(tan(FOV/2)) = 1920/2. * 1./np.tan(np.radians(30)) = 1662.768

Jul 09 '22 03:07 Arthur151

ok thx, I'll try

Jul 17 '22 08:07 Andyen512

length

hi @hongsiyu , can you tell me how to use ROMP to obtain "3x3" cam_intrinsics and "4x4" cam_extrinsics, thanks.

May 11 '23 07:05 mch0dmin

ROMP ROMP copied to clipboard

To get cam_intrinsics and cam_extrinsics from .npz files

ROMP
ROMP copied to clipboard