ContrastiveLosses4VRD Runtime error during evaluation

Dear Ji,

I ran into this runtime error when trying to evaluate the model with pertained checkpoints:

python ./tools/test_net_rel.py --dataset vg --cfg configs/vg/e2e_faster_rcnn_VGG16_8_epochs_vg_v3_default_node_contrastive_loss_w_so_p_aware_margin_point2_so_weight_point5_no_spt.yaml --load_ckpt trained_models/vg_VGG16/model_step62722.pth --output_dir Outputs/vg_VGG16 --multi-gpu-testing --do_val

RuntimeError: Error(s) in loading state_dict for Generalized_RCNN: size mismatch for RelDN.prd_cls_feats.0.weight: copying a param of torch.Size([6144, 12288]) from checkpoint, where the shape is torch.Size([4096, 12288]) in current model. size mismatch for RelDN.prd_cls_feats.0.bias: copying a param of torch.Size([6144]) from checkpoint, where the shape is torch.Size([4096]) in current model.

Would you please help me with this issue? Thank you very much.

Aug 08 '19 04:08 stevehuanghe

I also faced the same problem. I try to change the size of RelDN.prd_cls_feats.0.weight in lib\modeling_rel\reldn_heads.py from (6144, 12288) to (4096, 12288), but I can't get the same evaluation result as the paper. Did you find a solution for the issue?

Aug 12 '19 09:08 heygrandpa

I also faced the same problem. I try to change the size of RelDN.prd_cls_feats.0.weight in lib\modeling_rel\reldn_heads.py from (6144, 12288) to (4096, 12288), but I can't get the same evaluation result as the paper. Did you find a solution for the issue?

Yes, faced the same. I also changed (6144, 12288) to (4096, 12288), and my SGDET results are 16.01 for R@20, 23.32 for R@50 and 29.53 for R@100. That's actually far from paper's results.

Aug 25 '19 05:08 simonJJJ

@simonJJJ @heygrandpa @stevehuanghe @jz462 Did anyone find a solution to this or is it the fault in the pre-trained model itself?

Sep 02 '19 11:09 sandeep-ipk

Hi everyone,

Sorry for the late reply. I've updated the link which contains a compatible VGG16 that gives a results on par with the paper. You can also download it here. Please let me know if it does not work or if you have further questions.

Ji

Sep 03 '19 04:09 jz462

Hi Ji, Thanks for the great work! The error still exists using updated models. But the ResNeXt model works well.

Sep 05 '19 04:09 tfzhou

I evaluate the trained VGG16 model on Sdget task on Visual Genome and followings result:

R@20: 20.74 
R@50: 29.36
R@100: 35.95

These results are somewhat different from the result of the paper? Does anyone get the same results?

Sep 05 '19 06:09 cao-nv

Another problem occurs when I enable multi-gpu-testing inference, an error occurs: AssertionError: Range subprocess failed (exit code: 1). Could you give me a recommendation to solve this problem?

Sep 05 '19 06:09 cao-nv

Hi @cao-nv, Yes I confirm that these are the valid reproduced results. A little suggestion of mine: if you want to compare with our method, these results are definitely OK; if you plan to use our method to obtain scene graphs as features for down-stream tasks, you don't have to struggle with the VGG16 backbone. ResNext is clearly better for your need.

About you multi-gpu issue, you need to make sure the value of CUDA_VISIBLE_DEVICES is equal to the actual GPUs you have on your machine, because our code determines the GPUs by only looking at CUDA_VISIBLE_DEVICES.

Ji

Sep 07 '19 20:09 jz462

Thanks @jz462, For the multi-gpu issue, I share a server with 7 working GPUs with others, so that I often set the number of visible gpus to 2, or 4. Is it ok, or the number of visible GPUs must be 7.

Sep 09 '19 06:09 cao-nv

@cao-nv It should be OK if you do export CUDA_VISIBLE_DEVICES=<g1,g2,...> where g1,g2 are the indices of the GPUs you want to use, and you can set any number of these as you want.

Sep 14 '19 04:09 jz462

I got this annoying error every time the number of visible GPUs is not 1 and multi-gpu-test is enable. Perhaps there is a problem with subprocess, the returncode is 1, but expected 0.

Sep 15 '19 07:09 cao-nv

I got this annoying error every time the number of visible GPUs is not 1 and multi-gpu-test is enable. Perhaps there is a problem with subprocess, the returncode is 1, but expected 0.

Hi, did you solve this problem. I met the same error with you. Any suggestions?

Aug 03 '21 13:08 ByZ0e

I got this annoying error every time the number of visible GPUs is not 1 and multi-gpu-test is enable. Perhaps there is a problem with subprocess, the returncode is 1, but expected 0.

Hi, did you solve this problem. I met the same error with you. Any suggestions?

Unfortunately, I didn't found any solution for the issue, so I just moved to other scene graph generation model

Aug 04 '21 15:08 cao-nv

hi Ji，your new trained models in https://drive.google.com/file/d/15w0q3Nuye2ieu_aUNdTS_FNvoVzM4RMF/view use the same detection model with before trained models?

Dec 02 '21 08:12 luckyyy00

ContrastiveLosses4VRD ContrastiveLosses4VRD copied to clipboard

Runtime error during evaluation

ContrastiveLosses4VRD
ContrastiveLosses4VRD copied to clipboard