MobileHumanPose icon indicating copy to clipboard operation
MobileHumanPose copied to clipboard

MPJPE too high for protocol 2

Open baishali1986 opened this issue 2 years ago • 4 comments

Hi

Thank you for providing the training , eval and data preparation scripts. I followed the readme and set up the data and all the scripts in correct locations as indicated by t h file structure in readme. I kept the exact same configuration as the config.py and ran the training script for the same number of epochs. The train datase was human36 and mpii and tes set was human36 . protocol followed was the default in the code which is 2.

when i used the saved checkpoints to run the inference i get : Protocol 2 error (MPJPE) >> tot: 67.92 Directions: 60.23 Discussion: 72.39 Eating: 60.08 Greeting: 62.74 Phoning: 66.70 Posing: 59.33 Purchases: 61.72 Sitting: 81.54 SittingDown: 90.77 Smoking: 67.13 Photo: 80.52 Waiting: 63.39 Walking: 51.68 WalkDog: 71.55 WalkTogether: 58.84

which is obviously pretty high than what is expected. Just as a note i was getting an error while using broadcast in model.py module 'torch.nn.parallel' has no attribute 'comm' So I changed the original script from

accu_x = accu_x * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[1]+1).type(torch.cuda.FloatTensor), devices=[accu_x.device.index])[0]
accu_y = accu_y * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[0]+1).type(torch.cuda.FloatTensor), devices=[accu_y.device.index])[0]
accu_z = accu_z * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.depth_dim+1).type(torch.cuda.FloatTensor), devices=[accu_z.device.index])[0]

TO:

accu_x = accu_x * torch.arange(1, cfg.output_shape[1] + 1).type(torch.cuda.FloatTensor)
accu_y = accu_y * torch.arange(1, cfg.output_shape[0] + 1).type(torch.cuda.FloatTensor)
accu_z = accu_z * torch.arange(1, cfg.depth_dim + 1).type(torch.cuda.FloatTensor)

I didnot change anything else including batch size. And again same config file. If you can please help me root cause the issue it will be very helpful. Thanks

baishali1986 avatar Mar 28 '22 23:03 baishali1986

I had the same question. My setting is: trainset_3d = ['Human36M'] trainset_2d = ['MSCOCO', 'MPII']

The test results agasint Human36M are: Directions: 62.02 Discussion: 73.63 Eating: 62.07 Greeting: 63.97 Phoning: 68.02 Posing: 63.70 Purchases: 65.63 Sitting: 81.91 SittingDown: 90.56 Smoking: 69.21 Photo: 79.28 Waiting: 66.57 Walking: 54.77 WalkDog: 73.78 WalkTogether: 62.65

liamsun2019 avatar Mar 29 '22 01:03 liamsun2019

@baishali1986 I also think that the broadcast function would not affect the result. https://github.com/SangbumChoi/MobileHumanPose/issues/10 I will check again maybe it is maybe my mistake of changing master branch to XS model for super-fast inference

SangbumChoi avatar Mar 30 '22 11:03 SangbumChoi

@SangbumChoi

Based on the latest codes, I made following changes:

  1. output_shape = (input_shape[0]//4, input_shape[1]//4)
  2. depth_dim = 64
  3. [1, 64, 1, 2] ==> [1, 64, 1, 1] inverted_residual_setting
  4. out_channels= joint_num * 32 ==> out_channels= joint_num * 64 for self.final_layer

The test against Human3.6M is below: 03-31 09:15:37 Protocol 2 error (MPJPE) >> tot: 60.82 Directions: 53.71 Discussion: 62.10 Eating: 56.48 Greeting: 56.14 Phoning: 61.77 Posing: 51.03 Purchases: 58.49 Sitting: 73.98 SittingDown: 81.49 Smoking: 60.56 Photo: 68.96 Waiting: 55.13 Walking: 46.71 WalkDog: 64.67 WalkTogether: 53.24

It makes some improvement compared to previous settings. But still cannot match that in your paper.

liamsun2019 avatar Mar 31 '22 01:03 liamsun2019

@SangbumChoi when will release train code?

akk-123 avatar Apr 22 '22 09:04 akk-123