2s-AGCN icon indicating copy to clipboard operation
2s-AGCN copied to clipboard

Error Loading pretrained weights

Open is-cs opened this issue 5 years ago • 6 comments

Hi. Thank you for posting the code and weights. I get errors loading your weights: Setup: Python: 3.6 Pytorch: 0.3.1 on cuda 9.0.

I have download NTU-RGBD dataset from official site and preprocessed using your scripts ntu_gendata.py followed by gen_bone_data.py

when i run test script as follows: python main.py --config ./config/nturgbd-cross-view/test_bone.yaml --weights pretrained_weights/ntu_cv_agcn_bone-49-29400.pt --save-score 1 --device 0 1

I get the following error: KeyError: 'unexpected key "l1.gcn1.conv_res.0.weight" in state_dict'

Similarly, on running: python main.py --config ./config/nturgbd-cross-subject/test_bone.yaml --weights pretrained_weights/ntu_cs_agcn_bone-49-31300.pt --save-score 1 --device 0 1 2

I get the following error: KeyError: 'unexpected key "l1a.PA" in state_dict'

Please guide me if this is a problem with weights or my setup?

Thank you

is-cs avatar Dec 04 '19 15:12 is-cs

For the first issue, try to change the name of self.down to self.conv_res in https://github.com/lshiwjx/2s-AGCN/blob/d4880e1cfc0822cc86beb2c9dd7463f904dd40ea/model/agcn.py#L72

For the second issue, try to change the name of self.l1 to self.l1a in https://github.com/lshiwjx/2s-AGCN/blob/d4880e1cfc0822cc86beb2c9dd7463f904dd40ea/model/agcn.py#L145

Or just re-train the model. Sorry for this mistake, but it has been a long day and I forgot some details.

lshiwjx avatar Dec 10 '19 02:12 lshiwjx

Hi. Thank you for replying. There were other issues similar to these, so now i am retraining. But it will be great if you could release a working set of weights if you get time. Thanks again!

is-cs avatar Dec 10 '19 19:12 is-cs

thank your source code, but when I run this code, The following error occurs: ValueError: num_samples should be a positive integer value, but got num_samples=0

I've run the program 'python data_gen/ntu_gendata.py 'before, and some documents were generated : train_data_joint.npy train_label.pkl val_data_joint.npy val_label.pkl

but their size are all 1K

How should I deal with, trouble you give directions.

thanks

xuanshibin avatar Feb 06 '20 09:02 xuanshibin

For the first issue, try to change the name of self.down to self.conv_res in

https://github.com/lshiwjx/2s-AGCN/blob/d4880e1cfc0822cc86beb2c9dd7463f904dd40ea/model/agcn.py#L72

For the second issue, try to change the name of self.l1 to self.l1a in

https://github.com/lshiwjx/2s-AGCN/blob/d4880e1cfc0822cc86beb2c9dd7463f904dd40ea/model/agcn.py#L145

Or just re-train the model. Sorry for this mistake, but it has been a long day and I forgot some details.

I have the same problem, it seems only test joints on ntu-crossview, it will work. otherwise there will be the same problem of the model mismatch even if I try the above modifications. Could you please help me with this? If you get any time, Thanks

erinchen824 avatar Oct 24 '20 17:10 erinchen824

[ Sun Oct 25 02:03:25 2020 ] Load weights from ./runs/ntu_cv_agcn_joint-49-29400.pt. [ Sun Oct 25 02:03:25 2020 ] using warm up, epoch: 0 [ Sun Oct 25 02:03:25 2020 ] Model: model.agcn.Model. [ Sun Oct 25 02:03:25 2020 ] Weights: ./runs/ntu_cv_agcn_joint-49-29400.pt. [ Sun Oct 25 02:03:25 2020 ] Eval epoch: 1 main.py:460: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():instead. volatile=True) main.py:464: UserWarning: volatile was removed and now has no effect. Usewith torch.no_grad():instead. volatile=True) 100%|███████████████████████████████████████████| 74/74 [03:43<00:00, 3.03s/it] Accuracy: 0.012465666596239171 model: ./runs/ntu_cv_agcn_test_joint [ Sun Oct 25 02:07:09 2020 ] Mean test loss of 74 batches: 16.15059603871526. [ Sun Oct 25 02:07:09 2020 ] Top1: 1.25% [ Sun Oct 25 02:07:09 2020 ] Top5: 8.16% [ Sun Oct 25 02:07:10 2020 ] Done.

there might be some problem with the provided model. I try it with ntu-cross-view on joints data with provided model, the result is too poor. Is there anything wrong with the model or Did I do something wrong? Thanks

erinchen824 avatar Oct 24 '20 18:10 erinchen824

[ Sun Oct 25 02:03:25 2020 ] Load weights from ./runs/ntu_cv_agcn_joint-49-29400.pt. [ Sun Oct 25 02:03:25 2020 ] using warm up, epoch: 0 [ Sun Oct 25 02:03:25 2020 ] Model: model.agcn.Model. [ Sun Oct 25 02:03:25 2020 ] Weights: ./runs/ntu_cv_agcn_joint-49-29400.pt. [ Sun Oct 25 02:03:25 2020 ] Eval epoch: 1 main.py:460: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad():instead. volatile=True) main.py:464: UserWarning: volatile was removed and now has no effect. Usewith torch.no_grad():instead. volatile=True) 100%|███████████████████████████████████████████| 74/74 [03:43<00:00, 3.03s/it] Accuracy: 0.012465666596239171 model: ./runs/ntu_cv_agcn_test_joint [ Sun Oct 25 02:07:09 2020 ] Mean test loss of 74 batches: 16.15059603871526. [ Sun Oct 25 02:07:09 2020 ] Top1: 1.25% [ Sun Oct 25 02:07:09 2020 ] Top5: 8.16% [ Sun Oct 25 02:07:10 2020 ] Done.

there might be some problem with the provided model. I try it with ntu-cross-view on joints data with provided model, the result is too poor. Is there anything wrong with the model or Did I do something wrong? Thanks

I changed the layers name of the mismatched ones and reload the model, however, it seems all of the provided model isn't working well with a poor performance. Start training from scratch.

erinchen824 avatar Oct 26 '20 07:10 erinchen824