ContrastiveLosses4VRD icon indicating copy to clipboard operation
ContrastiveLosses4VRD copied to clipboard

Runtime error during evaluation

Open stevehuanghe opened this issue 6 years ago • 14 comments

Dear Ji,

I ran into this runtime error when trying to evaluate the model with pertained checkpoints:

python ./tools/test_net_rel.py --dataset vg --cfg configs/vg/e2e_faster_rcnn_VGG16_8_epochs_vg_v3_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5_no_spt.yaml --load_ckpt trained_models/vg_VGG16/model_step62722.pth --output_dir Outputs/vg_VGG16 --multi-gpu-testing --do_val

RuntimeError: Error(s) in loading state_dict for Generalized_RCNN: size mismatch for RelDN.prd_cls_feats.0.weight: copying a param of torch.Size([6144, 12288]) from checkpoint, where the shape is torch.Size([4096, 12288]) in current model. size mismatch for RelDN.prd_cls_feats.0.bias: copying a param of torch.Size([6144]) from checkpoint, where the shape is torch.Size([4096]) in current model.

Would you please help me with this issue? Thank you very much.

stevehuanghe avatar Aug 08 '19 04:08 stevehuanghe

I also faced the same problem. I try to change the size of RelDN.prd_cls_feats.0.weight in lib\modeling_rel\reldn_heads.py from (6144, 12288) to (4096, 12288), but I can't get the same evaluation result as the paper. Did you find a solution for the issue?

heygrandpa avatar Aug 12 '19 09:08 heygrandpa

I also faced the same problem. I try to change the size of RelDN.prd_cls_feats.0.weight in lib\modeling_rel\reldn_heads.py from (6144, 12288) to (4096, 12288), but I can't get the same evaluation result as the paper. Did you find a solution for the issue?

Yes, faced the same. I also changed (6144, 12288) to (4096, 12288), and my SGDET results are 16.01 for R@20, 23.32 for R@50 and 29.53 for R@100. That's actually far from paper's results.

simonJJJ avatar Aug 25 '19 05:08 simonJJJ

@simonJJJ @heygrandpa @stevehuanghe @jz462 Did anyone find a solution to this or is it the fault in the pre-trained model itself?

sandeep-ipk avatar Sep 02 '19 11:09 sandeep-ipk

Hi everyone,

Sorry for the late reply. I've updated the link which contains a compatible VGG16 that gives a results on par with the paper. You can also download it here. Please let me know if it does not work or if you have further questions.

Ji

jz462 avatar Sep 03 '19 04:09 jz462

Hi Ji, Thanks for the great work! The error still exists using updated models. But the ResNeXt model works well.

tfzhou avatar Sep 05 '19 04:09 tfzhou

I evaluate the trained VGG16 model on Sdget task on Visual Genome and followings result:

R@20: 20.74 
R@50: 29.36
R@100: 35.95 

These results are somewhat different from the result of the paper? Does anyone get the same results?

cao-nv avatar Sep 05 '19 06:09 cao-nv

Another problem occurs when I enable multi-gpu-testing inference, an error occurs: AssertionError: Range subprocess failed (exit code: 1). Could you give me a recommendation to solve this problem?

cao-nv avatar Sep 05 '19 06:09 cao-nv

Hi @cao-nv, Yes I confirm that these are the valid reproduced results. A little suggestion of mine: if you want to compare with our method, these results are definitely OK; if you plan to use our method to obtain scene graphs as features for down-stream tasks, you don't have to struggle with the VGG16 backbone. ResNext is clearly better for your need.

About you multi-gpu issue, you need to make sure the value of CUDA_VISIBLE_DEVICES is equal to the actual GPUs you have on your machine, because our code determines the GPUs by only looking at CUDA_VISIBLE_DEVICES.

Ji

jz462 avatar Sep 07 '19 20:09 jz462

Thanks @jz462, For the multi-gpu issue, I share a server with 7 working GPUs with others, so that I often set the number of visible gpus to 2, or 4. Is it ok, or the number of visible GPUs must be 7.

cao-nv avatar Sep 09 '19 06:09 cao-nv

@cao-nv It should be OK if you do export CUDA_VISIBLE_DEVICES=<g1,g2,...> where g1,g2 are the indices of the GPUs you want to use, and you can set any number of these as you want.

jz462 avatar Sep 14 '19 04:09 jz462

I got this annoying error every time the number of visible GPUs is not 1 and multi-gpu-test is enable. image Perhaps there is a problem with subprocess, the returncode is 1, but expected 0.

cao-nv avatar Sep 15 '19 07:09 cao-nv

I got this annoying error every time the number of visible GPUs is not 1 and multi-gpu-test is enable. image Perhaps there is a problem with subprocess, the returncode is 1, but expected 0.

Hi, did you solve this problem. I met the same error with you. Any suggestions?

ByZ0e avatar Aug 03 '21 13:08 ByZ0e

I got this annoying error every time the number of visible GPUs is not 1 and multi-gpu-test is enable. image Perhaps there is a problem with subprocess, the returncode is 1, but expected 0.

Hi, did you solve this problem. I met the same error with you. Any suggestions?

Unfortunately, I didn't found any solution for the issue, so I just moved to other scene graph generation model

cao-nv avatar Aug 04 '21 15:08 cao-nv

hi Ji,your new trained models in https://drive.google.com/file/d/15w0q3Nuye2ieu_aUNdTS_FNvoVzM4RMF/view use the same detection model with before trained models?

luckyyy00 avatar Dec 02 '21 08:12 luckyyy00