How to generate targets for Body Center heatmap C_m, Camera map A_m, and SMPL map S_m
Hi authors,
First of all, I would like to thank you so much for your great work ROMP!
I am facing issues related to training. Could you share the code for generating the ground truths for training the models?
Thank you in advance!
Thank you for your interest in our work. We have prepared a commit of training code. However, ROMP is under submission. We are still waiting for the final decision to release. About the ground truth, please refer to matching part. We didn't generate the ground truth Cam/SMPL map. The parameters are supervised via a center-guided matching process.
Thank you for your quick reply!
Before delving into the code, I have a question. In Section 3.5, you mentioned the Mesh Parameter Loss, including the loss of the pose parameters and the loss of the shape parameters, for example. These two losses are simply the L2 loss. Therefore, how did you get the ground-truth pose and shape parameters for every image?
Please excuse me if there is any misunderstanding! I am new in this domain.
The ground truth SMPL parameters are
- parsed from MoCap data using MoSh algorithm; such as Human3.6M dataset
- downloaded from EFT
Oh. Great! Thank you so much for your answer!
I already read EFT, and this is clear for pseudo-3D labels. However, I am still confused with 3D pose datasets, such as Human3.6M and MPI-INF-3DHP, used in your paper. I already downloaded the Human3.6M dataset, and it has a different format with the ground-truth labels in your paper. Human3.6M dataset I downloaded has:
action <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "action": shape (109867,), type "<i8">
bbox <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "bbox": shape (109867, 4), type "<u2">
camera <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "camera": shape (109867,), type "<i8">
id <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "id": shape (109867,), type "<i8">
joint_2d <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "joint_2d": shape (109867, 16, 2), type "<f8">
joint_3d_mono <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "joint_3d_mono": shape (109867, 16, 3), type "<f8">
subaction <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "subaction": shape (109867,), type "<i8">
subject <class 'h5py._hl.dataset.Dataset'> <HDF5 dataset "subject": shape (109867,), type "<i8">
I am understanding that there are 109867 person instances here. The camera field has the shape of (109867,1) while the camera map in your paper contains 3 elements (scale s, translation t_x, and translation t_y).
Am I understanding incorrectly?
Or could you please give me some details?
Thank you so much for your help!
ROMP doesn't supervise the camera parameters. What we supervise is the projected 2D joints, which is calculated using the estimated camera parameters + 3D pose. Learning 2D joints would help the model to learn the camera map automaticly.
Oh, I see that point.
Thank you so much!
ROMP doesn't supervise the camera parameters. What we supervise is the projected 2D joints, which is calculated using the estimated camera parameters + 3D pose. Learning 2D joints would help the model to learn the camera map automaticly.
I have a question related to the training process. From my understanding, the outputs of your model consist of SMPL parameters (pose and shape parameters). Then those SMPL parameres are fed into SMPL model to obtain mesh. From the mesh, you get the 3D keypoints by using a sparse matrix. 3D keypoints are also used to compute the loss.
My question is: Is SMPL model trained or freezed during the training process?
Yes, we freeze the SMPL parameters during training. The statistical parameter of SMPL is not supposed to be changed during training.
If you froze the SMPL model, the predicted 3D keypoints will not contribute to updating the model. Is this right?
No, the gradients will still be calculated but we don't use it to update the parameters. In other word, no optimizer is defined for the SMPL parameter update.
Clear. I see. Thank you so much!