MobileHumanPose Difference between results from inference and the paper

First, thanks for your great work.

I trained the model using script 'python train.py --gpu 0-1 --backbone LPSKI' with Human3.6M and MPII datasets. The protocol is 1 and train epoch was 25.

And I tested the model with test.py and the result is like below : Protocol 1 error (PA MPJPE) >> tot: 42.72 Directions: 37.63 Discussion: 39.01 Eating: 45.51 Greeting: 43.06 Phoning: 41.33 Posing: 41.10 Purchases: 35.78 Sitting: 43.50 SittingDown: 57.36 Smoking: 47.08 Photo: 51.04 Waiting: 38.32 Walking: 30.94 WalkDog: 46.21 WalkTogether: 38.39

I found the average MPJPE of Protocol1 on paper is 35.2 which is different from my result. Did I miss something to get the right result?? Like other settings in config.py...

Also, my train time was 16 hours with RTX2080 and the train time on paper is 3 days with 2 RTX titans. So I also wonder what makes time difference between my result and the paper.

Aug 25 '21 08:08 unoShin

Did you checked that the config.py is initially set to extra small model (I reported three different types which are small, large, and extra small)? seems like 16 hours of training time is also seems that you used extra small model. Please let me know if you have further question.

Aug 25 '21 09:08 SangbumChoi

Did you mean embedding_size in config.py to change the type? It is 2048 and I checked the large model uses 2048 embedding channels on paper.

The other setting in config.py is like below : class Config:

## dataset
# training set
# 3D: Human36M, MuCo
# 2D: MSCOCO, MPII 
trainset_3d = ['Human36M']
trainset_2d = ['MPII']

# testing set
# Human36M, MuPoTS, MSCOCO
testset = 'Human36M'

## directory
cur_dir = osp.dirname(os.path.abspath(__file__))
root_dir = osp.join(cur_dir, '..')
data_dir = osp.join(root_dir, 'data')
output_dir = osp.join(root_dir, 'output')
model_dir = osp.join(output_dir, 'model_dump')
pretrain_dir = osp.join(output_dir, 'pre_train')
vis_dir = osp.join(output_dir, 'vis')
log_dir = osp.join(output_dir, 'log')
result_dir = osp.join(output_dir, 'result')

## input, output
input_shape = (256, 256) 
output_shape = (input_shape[0]//8, input_shape[1]//8)
width_multiplier = 1.0
depth_dim = 32
bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
pixel_mean = (0.485, 0.456, 0.406)
pixel_std = (0.229, 0.224, 0.225)

## training config
embedding_size = 2048
lr_dec_epoch = [17, 21]
end_epoch = 25
lr = 1e-3
lr_dec_factor = 10
batch_size = 64

## testing config
test_batch_size = 1
flip_test = True
use_gt_info = True

## others
num_thread = 20
gpu_ids = '0'
num_gpus = 1
continue_train = False

Aug 25 '21 09:08 unoShin

no you should try to change depth_dim 32 to 64 and also if there is error shows up then try to manage output_shape also maybe correct answer should be like

input_shape = (256, 256) 
output_shape = (input_shape[0]//4, input_shape[1]//4)
width_multiplier = 1.0
depth_dim = 64
bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
pixel_mean = (0.485, 0.456, 0.406)
pixel_std = (0.229, 0.224, 0.225)

Aug 25 '21 09:08 SangbumChoi

It actually made an error like this :

File "/home/unolab/Yoonho/Pose/MobileHumanPose/main/model.py", line 67, in forward loss_coord = torch.abs(coord - target_coord) * target_vis RuntimeError: The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0

I changed the output shape to solve the error : output_shape = (input_shape[0]//(8math.sqrt(2)), input_shape[1]//(8math.sqrt(2)))

And it made another error : File "/home/unolab/Yoonho/Pose/MobileHumanPose/main/model.py", line 29, in soft_argmax heatmaps = heatmaps.reshape((-1, joint_num, cfg.depth_dim*cfg.output_shape[0]*cfg.output_shape[1])) TypeError: reshape(): argument 'shape' must be tuple of ints, but found element of type float at pos 3

Aug 25 '21 10:08 unoShin

@unoShin I found that there was slight mis-match of large model so I uploaded in large branch in case of skip concat. Please let me know if this won't work

Aug 25 '21 11:08 SangbumChoi

@SangbumChoi Thank you :)

Aug 25 '21 11:08 unoShin

I checked it works for training and it will take about 50 hours with RTX 2080. Thank you for your support!

Aug 26 '21 06:08 unoShin

@unoShin Sounds great please close this issue (otherwise I will in shortly) when the score is similiar to the paper :)

Aug 26 '21 11:08 SangbumChoi

Protocol 1 error (PA MPJPE) >> tot: 40.13 Directions: 34.42 Discussion: 35.80 Eating: 44.37 Greeting: 41.69 Phoning: 38.69 Posing: 37.26 Purchases: 34.61 Sitting: 42.46 SittingDown: 54.30 Smoking: 42.58 Photo: 48.37 Waiting: 35.72 Walking: 29.49 WalkDog: 43.04 WalkTogether: 35.15

I trained the model with new branch(large, 25epochs) and got the result like that. There still is difference between 40.13 and 35.2(on paper) in MPJPE.

config.py : trainset_3d = ['Human36M'] trainset_2d = ['MPII']

# testing set
# Human36M, MuPoTS, MSCOCO
testset = 'Human36M'

## directory
cur_dir = osp.dirname(os.path.abspath(__file__))
root_dir = osp.join(cur_dir, '..')
data_dir = osp.join(root_dir, 'data')
output_dir = osp.join(root_dir, 'output')
model_dir = osp.join(output_dir, 'model_dump')
pretrain_dir = osp.join(output_dir, 'pre_train')
vis_dir = osp.join(output_dir, 'vis')
log_dir = osp.join(output_dir, 'log')
result_dir = osp.join(output_dir, 'result')

## input, output
input_shape = (256, 256) 
output_shape = (input_shape[0]//4, input_shape[1]//4)
width_multiplier = 1.0
depth_dim = 64
bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
pixel_mean = (0.485, 0.456, 0.406)
pixel_std = (0.229, 0.224, 0.225)

## training config
embedding_size = 2048
lr_dec_epoch = [17, 21]
end_epoch = 25
lr = 1e-3
lr_dec_factor = 10
batch_size = 16

## testing config
test_batch_size = 16
flip_test = True
use_gt_info = True

## others
num_thread = 20
gpu_ids = '0'
num_gpus = 1
continue_train = False

And protocol is 1 and bbox root file is from Subject 11 (trained on subject 1,5,6,7,8,9). Did I do something wrong to have the wrong result?

Aug 30 '21 02:08 unoShin

Protocol 1 error (PA MPJPE) >> tot: 40.13 Directions: 34.42 Discussion: 35.80 Eating: 44.37 Greeting: 41.69 Phoning: 38.69 Posing: 37.26 Purchases: 34.61 Sitting: 42.46 SittingDown: 54.30 Smoking: 42.58 Photo: 48.37 Waiting: 35.72 Walking: 29.49 WalkDog: 43.04 WalkTogether: 35.15

I trained the model with new branch(large, 25epochs) and got the result like that. There still is difference between 40.13 and 35.2(on paper) in MPJPE.

config.py : trainset_3d = ['Human36M'] trainset_2d = ['MPII']
# testing set
# Human36M, MuPoTS, MSCOCO
testset = 'Human36M'

## directory
cur_dir = osp.dirname(os.path.abspath(__file__))
root_dir = osp.join(cur_dir, '..')
data_dir = osp.join(root_dir, 'data')
output_dir = osp.join(root_dir, 'output')
model_dir = osp.join(output_dir, 'model_dump')
pretrain_dir = osp.join(output_dir, 'pre_train')
vis_dir = osp.join(output_dir, 'vis')
log_dir = osp.join(output_dir, 'log')
result_dir = osp.join(output_dir, 'result')

## input, output
input_shape = (256, 256) 
output_shape = (input_shape[0]//4, input_shape[1]//4)
width_multiplier = 1.0
depth_dim = 64
bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
pixel_mean = (0.485, 0.456, 0.406)
pixel_std = (0.229, 0.224, 0.225)

## training config
embedding_size = 2048
lr_dec_epoch = [17, 21]
end_epoch = 25
lr = 1e-3
lr_dec_factor = 10
batch_size = 16

## testing config
test_batch_size = 16
flip_test = True
use_gt_info = True

## others
num_thread = 20
gpu_ids = '0'
num_gpus = 1
continue_train = False
And protocol is 1 and bbox root file is from Subject 11 (trained on subject 1,5,6,7,8,9). Did I do something wrong to have the wrong result?

Sorry for inconvenience. I was fool that I commit every intermediate progress on github so I just need to find those past commit. I will find you the appropriate large model code for everyone. The little thing that might concern is that batch_size due to individual gpu circumstances.

Just one thing that you can check right now is that whether if extra-small model scores same in the paper.

I will let you know if I find one.

Thanks

Aug 30 '21 04:08 SangbumChoi

@unoShin Can you try https://github.com/SangbumChoi/MobileHumanPose/blob/70baeafff0d57ab74a72abedb30c12e739da18ec/common/backbone/lpnet_ski_concat.py commit 70baeafff0d57ab74a72abedb30c12e739da18ec

Aug 30 '21 08:08 SangbumChoi

@SangbumChoi I will try and let you know. Thank you.

Aug 30 '21 08:08 unoShin

@SangbumChoi Is there only difference in line 130 of lpnet_ski_concat.py?

Aug 30 '21 09:08 unoShin

09-02 09:52:46 Protocol 1 error (PA MPJPE) >> tot: 40.21 Directions: 36.56 Discussion: 37.00 Eating: 42.51 Greeting: 41.39 Phoning: 38.17 Posing: 36.55 Purchases: 36.60 Sitting: 42.26 SittingDown: 55.09 Smoking: 41.85 Photo: 48.03 Waiting: 36.51 Walking: 29.55 WalkDog: 43.62 WalkTogether: 35.50

Using that commit version, the result still has some difference with the result of the paper.

Sep 02 '21 00:09 unoShin

09-02 09:52:46 Protocol 1 error (PA MPJPE) >> tot: 40.21 Directions: 36.56 Discussion: 37.00 Eating: 42.51 Greeting: 41.39 Phoning: 38.17 Posing: 36.55 Purchases: 36.60 Sitting: 42.26 SittingDown: 55.09 Smoking: 41.85 Photo: 48.03 Waiting: 36.51 Walking: 29.55 WalkDog: 43.62 WalkTogether: 35.50

Using that commit version, the result still has some difference with the result of the paper.

@unoShin Hi, I have two question for you

Did you use exactly same commit branch that I told you?
What was your batch size? and 2d dataset for Human3.6M?

if both answer seems reasonable than I will re-train my code to announce. It might take more than one week

Sep 02 '21 04:09 SangbumChoi

@SangbumChoi Hi,

Yes, I used this version : https://github.com/SangbumChoi/MobileHumanPose/commit/70baeafff0d57ab74a72abedb30c12e739da18ec So I asked you whether the difference is only line 130 of lpnet_ski_concat.py between large branch and 70baeaf.
Batch size is 8 and 2d dataset is MPII. And my training time is 2.21 hour/epoch with RTX 2080 and the total number of epochs is 25.

Thanks!

Sep 02 '21 04:09 unoShin

@unoShin I'm little bit concern that your batch size is different from original paper and code but let me re-check and share with you. Again this might takes some time

Sep 02 '21 05:09 SangbumChoi

@SangbumChoi Thanks for your support :)

Sep 02 '21 05:09 unoShin

Hi bro, I train with Human36 and MPII while get an error 400+. And the vis output result on 2D looks not so bad. I do not build bbox root file and use gt bbox, I want to figure out why I get a wrong error, how do you gene the bbox root file?

Sep 09 '21 08:09 ggfresh

@ggfresh it seems like getting error with more than 400+ might be causing with old-branch (see this issue). and also if the image file seems cropped than actually you don't have to build a gt and root bbox. However, you can generate bbox root file by object detection or RootNet (https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE)

Sep 09 '21 08:09 SangbumChoi

@ggfresh it seems like getting error with more than 400+ might be causing with old-branch (see this issue). and also if the image file seems cropped than actually you don't have to build a gt and root bbox. However, you can generate bbox root file by object detection or RootNet (https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE)

thanks for the reply, which issue?

Sep 09 '21 09:09 ggfresh

@SangbumChoi When I test the epoch 24，the error is big.

I have the same problem, while my train loss is norm I think. train_log_tmp

while I have use epochs 24-24

train_result

Sep 09 '21 09:09 ggfresh

@ggfresh it is very awkward that already current opened issue claims at least 40 MPJPE. Your description has lack of information to debug and find error. As you said training seems normal, and you might want to actually display the jpg file.

Sep 09 '21 09:09 SangbumChoi

Sorry, after checking, it is found that my own data is inconsistent.

Sep 09 '21 09:09 ggfresh

@unoShin I found that there was slight mis-match of large model so I uploaded in large branch in case of skip concat. Please let me know if this won't work

What was the uploaded large branch in case of skip concat? I couldn't find it on the code.

Jan 24 '23 21:01 junhee98

MobileHumanPose MobileHumanPose copied to clipboard

Difference between results from inference and the paper

MobileHumanPose
MobileHumanPose copied to clipboard