ROMP icon indicating copy to clipboard operation
ROMP copied to clipboard

To get cam_intrinsics and cam_extrinsics from .npz files

Open MoyGcc opened this issue 2 years ago • 12 comments

Hi Yu, thanks for your work and such an organized repo!

I'm now using ROMP to get SMPL poses and would like to visualize the meshes via a perspective camera. I usually use a similar way as https://github.com/chungyiweng/humannerf/issues/1 to convert the s, t_x, and t_y along with a human bbox to the pinhole camera parameters and it does work on VIBE output. However, it seems like I cannot easily get the parameters with the output from ROMP .npz outputs (I can get a rough bbox from pj2d_org ). I found that the scaling factor s is quite different between the VIBE and ROMP estimation for the same input image (~1.14 in VIBE and ~0.58 in ROMP). Could you please point out how I can quickly obtain (estimate) the camera intrinsic and extrinsic? Thanks!

MoyGcc avatar Jul 07 '22 13:07 MoyGcc

Hi, @MoyGcc Thanks for you kind word! You can use this function to achieve this. https://github.com/Arthur151/ROMP/blob/91dac0172c4dc0685b97f96eda9a3a53c626da47/simple_romp/romp/utils.py#L331 It take the estimated 3D joints, the 2D joints pj2d_org, image size, and the focal length to estimate the corresponding 3D translation in the camera space defined by these intrinsice paramters. In BEV, we calculate the focal length like this: focal length: when FOV=60 deg, 443.4 = H/2 * 1/(tan(FOV/2)) = 512/2. * 1./np.tan(np.radians(30)) BEV takes the square 512 x 512 input, we assume the FOV = 60 degree

Arthur151 avatar Jul 08 '22 08:07 Arthur151

Hi Yu, thanks for your work and such an organized repo!

I'm now using ROMP to get SMPL poses and would like to visualize the meshes via a perspective camera. I usually use a similar way as chungyiweng/humannerf#1 to convert the s, t_x, and t_y along with a human bbox to the pinhole camera parameters and it does work on VIBE output. However, it seems like I cannot easily get the parameters with the output from ROMP .npz outputs (I can get a rough bbox from pj2d_org ). I found that the scaling factor s is quite different between the VIBE and ROMP estimation for the same input image (~1.14 in VIBE and ~0.58 in ROMP). Could you please point out how I can quickly obtain (estimate) the camera intrinsic and extrinsic? Thanks!

Have you solved this problem? I met the same issue.

hongsiyu avatar Jul 08 '22 09:07 hongsiyu

Hi Yu @Arthur151, Thanks so much for the quick reply and for pointing out the correct way to do this. In the end, I followed the way that you applied for evaluation on AGORA: https://github.com/Arthur151/ROMP/blob/91dac0172c4dc0685b97f96eda9a3a53c626da47/simple_romp/evaluation/eval_AGORA.py#L79 and now the projected smpl mesh can align well with the image. Though there is still a "slight" difference (below, the one with normal color is my projected result) in terms of the projection. I think it's okay. @hongsiyu, you could probably also refer to the evaluation on the AGORA dataset for doing this.

0000 00000000

MoyGcc avatar Jul 08 '22 09:07 MoyGcc

That's clever. Glad to hear that.

Arthur151 avatar Jul 08 '22 09:07 Arthur151

So the intrinsic is ([443.4, 1, 512//2], [1, 443.4, 512//2],[0, 0, 1]), and the extrinsics[:3, 3] = cam_trans , right? @MoyGcc

Andyen512 avatar Jul 08 '22 10:07 Andyen512

Hi Yu @Arthur151, Thanks so much for the quick reply and for pointing out the correct way to do this. In the end, I followed the way that you applied for evaluation on AGORA:

https://github.com/Arthur151/ROMP/blob/91dac0172c4dc0685b97f96eda9a3a53c626da47/simple_romp/evaluation/eval_AGORA.py#L79

and now the projected smpl mesh can align well with the image. Though there is still a "slight" difference (below, the one with normal color is my projected result) in terms of the projection. I think it's okay. @hongsiyu, you could probably also refer to the evaluation on the AGORA dataset for doing this. 0000 00000000

I followed the way you mentioned with my own video. But the progress image in humannerf seems not correct. Do you succeed in trainiing humannerf with AGORA dataset.

hongsiyu avatar Jul 08 '22 10:07 hongsiyu

@Andyen512 No, The image size should be the original size on input image, not on the resize BEV's input map. It is fine to directly use the camera intrinsic in humannerf during calculating the 3D translation using estimate_translation

"cam_intrinsics": [
            [23043.9, 0.0,940.19],
            [0.0, 23043.9, 539.23],
            [0.0, 0.0, 1.0]

Arthur151 avatar Jul 08 '22 11:07 Arthur151

@Andyen512 No, The image size should be the original size on input image, not on the resize BEV's input map. It is fine to directly use the camera intrinsic in humannerf during calculating the 3D translation using estimate_translation

"cam_intrinsics": [
            [23043.9, 0.0,940.19],
            [0.0, 23043.9, 539.23],
            [0.0, 0.0, 1.0]

Thank you very much, the focal length makes me succeed in training humannerf.

hongsiyu avatar Jul 08 '22 11:07 hongsiyu

@Arthur151 Sorry, why using the humannerf cam_intrinsics? I was using romp --mode=video --calc_smpl --render_mesh -i=/path/to/video.mp4 -o=/path/to/output/folder/results.mp4 --save_video to inference my own video and I see the args.focal_length in https://github.com/Arthur151/ROMP/blob/91dac0172c4dc0685b97f96eda9a3a53c626da47/romp/lib/config.py#L60 is 443.4. Also, the original size of input image is 1920*1080, so why not the cam_intrinsics[0][2]=960, cam_intrinsics[1][2]=540? I was so confused.

Andyen512 avatar Jul 08 '22 15:07 Andyen512

@Andyen512 That focal length (23043.9) / image center coords (940.19, 539.23) is just for training humannerf in their camera extrinsic matrix.

To inference on you own video, you can re-calculate the focal length : when FOV=60 deg, focal length = H/2 * 1/(tan(FOV/2)) = 1920/2. * 1./np.tan(np.radians(30)) = 1662.768

Arthur151 avatar Jul 09 '22 03:07 Arthur151

ok thx, I'll try

Andyen512 avatar Jul 17 '22 08:07 Andyen512

length

hi @hongsiyu , can you tell me how to use ROMP to obtain "3x3" cam_intrinsics and "4x4" cam_extrinsics, thanks.

mch0dmin avatar May 11 '23 07:05 mch0dmin