ROMP icon indicating copy to clipboard operation
ROMP copied to clipboard

How to generate targets for Body Center heatmap C_m, Camera map A_m, and SMPL map S_m

Open tqtrunghnvn opened this issue 4 years ago • 11 comments

Hi authors,

First of all, I would like to thank you so much for your great work ROMP!

I am facing issues related to training. Could you share the code for generating the ground truths for training the models?

Thank you in advance!

tqtrunghnvn avatar Jun 30 '21 02:06 tqtrunghnvn

Thank you for your interest in our work. We have prepared a commit of training code. However, ROMP is under submission. We are still waiting for the final decision to release. About the ground truth, please refer to matching part. We didn't generate the ground truth Cam/SMPL map. The parameters are supervised via a center-guided matching process.

Arthur151 avatar Jun 30 '21 02:06 Arthur151

Thank you for your quick reply!

Before delving into the code, I have a question. In Section 3.5, you mentioned the Mesh Parameter Loss, including the loss of the pose parameters and the loss of the shape parameters, for example. These two losses are simply the L2 loss. Therefore, how did you get the ground-truth pose and shape parameters for every image?

Please excuse me if there is any misunderstanding! I am new in this domain.

tqtrunghnvn avatar Jun 30 '21 04:06 tqtrunghnvn

The ground truth SMPL parameters are

  1. parsed from MoCap data using MoSh algorithm; such as Human3.6M dataset
  2. downloaded from EFT

Arthur151 avatar Jun 30 '21 09:06 Arthur151

Oh. Great! Thank you so much for your answer!

I already read EFT, and this is clear for pseudo-3D labels. However, I am still confused with 3D pose datasets, such as Human3.6M and MPI-INF-3DHP, used in your paper. I already downloaded the Human3.6M dataset, and it has a different format with the ground-truth labels in your paper. Human3.6M dataset I downloaded has:

action <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "action": shape (109867,), type "<i8">
bbox <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "bbox": shape (109867, 4), type "<u2">
camera <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "camera": shape (109867,), type "<i8">
id <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "id": shape (109867,), type "<i8">
joint_2d <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "joint_2d": shape (109867, 16, 2), type "<f8">
joint_3d_mono <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "joint_3d_mono": shape (109867, 16, 3), type "<f8">
subaction <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "subaction": shape (109867,), type "<i8">
subject <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "subject": shape (109867,), type "<i8">

I am understanding that there are 109867 person instances here. The camera field has the shape of (109867,1) while the camera map in your paper contains 3 elements (scale s, translation t_x, and translation t_y). Am I understanding incorrectly? Or could you please give me some details?

Thank you so much for your help!

tqtrunghnvn avatar Jul 01 '21 04:07 tqtrunghnvn

ROMP doesn't supervise the camera parameters. What we supervise is the projected 2D joints, which is calculated using the estimated camera parameters + 3D pose. Learning 2D joints would help the model to learn the camera map automaticly.

Arthur151 avatar Jul 01 '21 13:07 Arthur151

Oh, I see that point.

Thank you so much!

tqtrunghnvn avatar Jul 05 '21 02:07 tqtrunghnvn

ROMP doesn't supervise the camera parameters. What we supervise is the projected 2D joints, which is calculated using the estimated camera parameters + 3D pose. Learning 2D joints would help the model to learn the camera map automaticly.

I have a question related to the training process. From my understanding, the outputs of your model consist of SMPL parameters (pose and shape parameters). Then those SMPL parameres are fed into SMPL model to obtain mesh. From the mesh, you get the 3D keypoints by using a sparse matrix. 3D keypoints are also used to compute the loss.

My question is: Is SMPL model trained or freezed during the training process?

tqtrunghnvn avatar Jul 29 '21 10:07 tqtrunghnvn

Yes, we freeze the SMPL parameters during training. The statistical parameter of SMPL is not supposed to be changed during training.

Arthur151 avatar Jul 29 '21 10:07 Arthur151

If you froze the SMPL model, the predicted 3D keypoints will not contribute to updating the model. Is this right?

tqtrunghnvn avatar Jul 29 '21 10:07 tqtrunghnvn

No, the gradients will still be calculated but we don't use it to update the parameters. In other word, no optimizer is defined for the SMPL parameter update.

Arthur151 avatar Jul 29 '21 10:07 Arthur151

Clear. I see. Thank you so much!

tqtrunghnvn avatar Jul 29 '21 10:07 tqtrunghnvn