py-R-FCN icon indicating copy to clipboard operation
py-R-FCN copied to clipboard

psroi_pooling_layer.cu:108 invalid configuration argument

Open harrycrossincode opened this issue 8 years ago • 9 comments

Hi,

I met the issue during ResNet-50 training: psroi_pooling_layer.cu:108] Check failed: error == cudaSuccess (9 vs. 0) invalid configuration argument

Ubuntu 14.04 + one 1080 GPU card.

Any idea on the issue? Thanks.

harrycrossincode avatar Nov 30 '16 03:11 harrycrossincode

Hi @harrycrossincode , Did you solve your issue? I also have same issue with same configuration while testing. I have two 1080 cards but I guess we can't use both of them due to python layers. Please let me know here in case you have solved it. If i get to solve it first then will update here. Thanks.

ravikantb avatar Dec 14 '16 06:12 ravikantb

You may check the annotation information. Please note that the roi is valid. When they are fixed, training is fine now.

haihaoshen avatar Dec 15 '16 00:12 haihaoshen

@haihaoshen Are you implying that the rois for test images (which in my understanding are stored in 'stage3_rpn_final_proposals.pkl') are invalid? If yes, then I wonder what could cause that. Is my training flawed or something else, any idea? Please correct me if I misunderstood what you are trying to convey. Thanks!

ravikantb avatar Dec 15 '16 06:12 ravikantb

It should be train set. Please use end2end mode as it is simpler.

haihaoshen avatar Dec 15 '16 09:12 haihaoshen

Actually I was able to train the model using alternate training approach using the script provided (py-R-FCN/experiments/scripts/rfcn_alt_opt_5stage_ohem.sh) without any problem. But in the end of this script it tries to test the trained model on test set, which is failing with the above error.

My understanding for the training and testing phase is that ROIs for both the training and test set are computed during training steps only. And once the training is over it tries to calculate mAP on test set using these ROIs. This belief comes from the fact that 'rfcn_test.pt', which is used for testing, does not have RPN layers and 'HAS_RPN' flag is set to False during testing. But we need ROIs from somewhere to proceed. This script did not work for me but I inserted RPN layers in 'rfcn_test.pt' and then tested the model on single images and it worked (though not as good as I would have liked to). I have ResNet-101 on training right now, hope it will work.

On a sidenote, since you have a similar configuration as mine, would you be interested in helping me with some more observations I had with my set up?

ravikantb avatar Dec 15 '16 10:12 ravikantb

@ravikantb Just run into this problem too. Turned on debug mode of ProposalTargetLayer and found that it sampled 0 fg and 0 bg. After modifying the valid image criteria in fast_rcnn.train.filter_roidb, I am able to train my own model without OHEM.

stanstarks avatar Dec 21 '16 10:12 stanstarks

@stanstarks I have met the same problem and I wonder what you actually do to modify the valid image criteria?

ivansong1988 avatar Mar 15 '17 07:03 ivansong1988

Has anyone found what @stanstarks means by modifying the image criteria in filter_roidb? I actually run into the same issue when trying to use my own dataset (fg num: 0 and bg num: 0).

Timonzimm avatar Mar 31 '17 14:03 Timonzimm

looking forward to the reply!!

junx1992 avatar Feb 12 '18 03:02 junx1992