Pose2Mesh_RELEASE icon indicating copy to clipboard operation
Pose2Mesh_RELEASE copied to clipboard

Training and using of PoseNet

Open longbowzhang opened this issue 4 years ago • 4 comments

Hi @hongsukchoi , superb work and thanks for sharing! But I have a few questions about the training of PoseNet.

[1] According to Fig. 9 of the suppl, H3.6M and Coco have different definition of joint sets. Thus I am wondering how to combine these two datasets together (as shown in Table 9) to train PoseNet?

[2] Besides, when employing an off-the-shelf 2D pose detector, how to make sure that the input of PoseNet is consistent with the output of the 2D pose detector?

Best

longbowzhang avatar Aug 24 '20 02:08 longbowzhang

Hi @longbowzhang, thanks for your compliment!

[1] To train PoseNet that lifts COCO defined joints, we use 2D-3D joint pairs regressed from the SMPL fits from each dataset. SMPL fits are pseudo-ground truth meshes which we obtained by the fitting framework. It is discussed in Section 6.1 of the main manuscript and Section 12 of the supplement.

To train PoseNet that lifts H36M defined joints, we used 2D-3D joint pairs regressed from the SMPLF fits from each dataset, except for H36M. For H36M, we use dataset annotations. You can check __getitem__ function in each dataset.py

[2] The input of PoseNet is synthesized 2D pose which is based on 'Absposelifter' and 'Posefix'. It is a model-agnostic 2D pose input (in this case, 2D pose detectors), which mimics the error distribution of multiple 2D detectors. You could also use the output of a specific 2D pose detector as train input, but it would make PoseNet dependent on the specific detector.

hongsukchoi avatar Aug 24 '20 14:08 hongsukchoi

Hi @longbowzhang, thanks for your compliment!

[1] To train PoseNet that lifts COCO defined joints, we use 2D-3D joint pairs regressed from the SMPL fits from each dataset. SMPL fits are pseudo-ground truth meshes which we obtained by the fitting framework. It is discussed in Section 6.1 of the main manuscript and Section 12 of the supplement.

To train PoseNet that lifts H36M defined joints, we used 2D-3D joint pairs regressed from the SMPLF fits from each dataset, except for H36M. For H36M, we use dataset annotations. You can check __getitem__ function in each dataset.py

[2] The input of PoseNet is synthesized 2D pose which is based on 'Absposelifter' and 'Posefix'. It is a model-agnostic 2D pose input (in this case, 2D pose detectors), which mimics the error distribution of multiple 2D detectors. You could also use the output of a specific 2D pose detector as train input, but it would make PoseNet dependent on the specific detector.

Hi @hongsukchoi , very thanks for sharing your amazing work! I am also a bit confused about pretrained models' names. Now it seems that,

  1. For "pose2mesh_human36J_gt_train_human36", the input 2D pose and groundtruth 3D pose for PoseNet is from Human 3.6M dataset annotations (2D pose is projected from 3D pose according to camera params in dataset annotations);
  2. For "pose2mesh_human36J_train_human36", the input 2D pose and groundtruth 3D pose for PoseNet is regressed from SMPLify-X fits.

Do I understand it correctly?

Szy-Young avatar Dec 23 '20 02:12 Szy-Young

Hi @Szy-Young, for both "pose2mesh_human36J_gt_train_human36" and "pose2mesh_human36J_train_human36", the input 2D pose and groundtruth 3D pose for PoseNet are from Human3.6M dataset annotations.

xxJ (in your case, human36J) indicates the joint set defined by the dataset as you can check here.

The dataset name(s) after the train (in your case, human36) indicates the training dataset.

The difference between "pose2mesh_human36J_gt_train_human36" and "pose2mesh_human36J_train_human36" is the input 2D pose. If gt is not included, the groundtruth 2D pose for the model input is synthesized with certain errors during training as written in the paper. If gt is included, the groundtruth 2D pose is just used for the model input during training.

If the joint set is defined by COCO and HUman36M is used for training(ex. "posenet_cocoJ_train_human36_coco_muco"), the input 2D pose and groundtruth 3D pose for PoseNet for Human3.6M is regressed from SMPLify-X fits. It's because Human3.6M dataset does not have annotation for the COCO defined joint set.

hongsukchoi avatar Dec 25 '20 14:12 hongsukchoi

Hi @Szy-Young, for both "pose2mesh_human36J_gt_train_human36" and "pose2mesh_human36J_train_human36", the input 2D pose and groundtruth 3D pose for PoseNet are from Human3.6M dataset annotations.

xxJ (in your case, human36J) indicates the joint set defined by the dataset as you can check here.

The dataset name(s) after the train (in your case, human36) indicates the training dataset.

The difference between "pose2mesh_human36J_gt_train_human36" and "pose2mesh_human36J_train_human36" is the input 2D pose. If gt is not included, the groundtruth 2D pose for the model input is synthesized with certain errors during training as written in the paper. If gt is included, the groundtruth 2D pose is just used for the model input during training.

If the joint set is defined by COCO and HUman36M is used for training(ex. "posenet_cocoJ_train_human36_coco_muco"), the input 2D pose and groundtruth 3D pose for PoseNet for Human3.6M is regressed from SMPLify-X fits. It's because Human3.6M dataset does not have annotation for the COCO defined joint set.

@hongsukchoi Thanks for patient answers. Got it very clearly!

Szy-Young avatar Dec 26 '20 07:12 Szy-Young