3DCrowdNet_RELEASE icon indicating copy to clipboard operation
3DCrowdNet_RELEASE copied to clipboard

Cannot reproduce without pre-trained ResNet-50 weights of xiao2018simple

Open mimiliaogo opened this issue 2 years ago • 9 comments

Hi, I tried to reproduce table 8 without pre-trained ResNet-50 weights of xiao2018simple. My training command is python train.py --amp --gpu 0 --cfg ../assets/yaml/3dpw_crowd.yml and the config file is :

trainset_3d: ['Human36M', 'MuCo']
trainset_2d: ['MSCOCO', 'MPII']
testset: 'PW3D'

lr_dec_epoch: [30]
end_epoch: 40
lr: 0.00025 #0.001/4
lr_backbone: 0.0001
lr_dec_factor: 10

However, I got very strange results on 3dpw as below (I evaluate every epoch): image

Do you have any idea about this? Thank you!

mimiliaogo avatar Sep 07 '22 14:09 mimiliaogo

Hi,

if you are not using the pretrained backbone, please set ‘lr’ and ‘lr_backbone’ the same

hongsukchoi avatar Sep 07 '22 21:09 hongsukchoi

I changed my config file as below:

trainset_3d: ['Human36M', 'MuCo']
trainset_2d: ['MSCOCO', 'MPII']
testset: 'PW3D'

lr_dec_epoch: [30]
end_epoch: 40
lr: 0.0005 
lr_backbone: 0.0005
lr_dec_factor: 10

# modify batch size
train_batch_size: 128
test_batch_size: 128

However, the results were still weird. image

mimiliaogo avatar Sep 09 '22 14:09 mimiliaogo

Hi,

Yes, the results seem weird.

  1. Are you evaluating on 3DPW-Crowd?

  2. How can you train that fast? I don’t remember exactly, but it took about more than 12hours to train for 6epochs. You are training for 40epochs with half batch size. 2days are not enough.

hongsukchoi avatar Sep 09 '22 16:09 hongsukchoi

  1. I evaluate on 3DPW. not Crowd.
  2. I used RTX3090 with batch size 128. The training time is 0.91h / epoch.

mimiliaogo avatar Sep 10 '22 03:09 mimiliaogo

Wow, I didn't know that RTX 3090 is that better than RTX 2080 ti.

I thought you were testing on 3DPW-Crowd, since you are using 3dpw_crowd.yml

My training command is python train.py --amp --gpu 0 --cfg ../assets/yaml/3dpw_crowd.yml

Can you share your full code via github repo? Some information is confusing. Increasing errors seem really weird.

hongsukchoi avatar Sep 10 '22 03:09 hongsukchoi

So sorry that I pasted the wrong command. My training command is python train.py --amp --gpu 0 --cfg ../assets/yaml/3dpw.yml This is my full code: https://github.com/mimiliaogo/3DCrowdNet-Mimi Thank you so much!

mimiliaogo avatar Sep 10 '22 03:09 mimiliaogo

Thanks for sharing the code. I can't find a critical bug...

Here are a few suggestions.

  1. Could you try testing with the test.py? Due to the evaluation per epoch, there could be unintentional overwriting in the testing data during the process.

  2. Could you visualize the training data? Visualize GT joints and meshes on the image. There could be corruption during downloading. And is there any change in MPII.py code?

  3. Could you train with this config info and see the result? It shouldn't take long. It's to see which dataset is causing the increasing error.

trainset_3d: []
trainset_2d: ['MSCOCO']
testset: 'PW3D'

lr_dec_epoch: [30]
end_epoch: 40
lr: 0.001
lr_backbone: 0.001
lr_dec_factor: 10

# modify batch size
train_batch_size: 128
test_batch_size: 128

hongsukchoi avatar Sep 10 '22 04:09 hongsukchoi

Hi, I tried your conifg as 3., the results seem normal. image So maybe the problem is from training data. I will try to visualize them. BTW, there is no change in MPII.py code.

mimiliaogo avatar Sep 11 '22 13:09 mimiliaogo

@hongsukchoi, when I train your model with Human3.6M and MuCo respectively, both of them will have increasing errors. I visualize the GT keypoints and joints, and the results seem normal (maybe a little inaccurate, but mostly right). However, I still don't know why these two datasets will lead to increasing errors... image image

mimiliaogo avatar Sep 12 '22 09:09 mimiliaogo