faster_rcnn
faster_rcnn copied to clipboard
Matlab system error when running script_faster_rcnn_VOC2007_ZF.m
I tried to run script_faster_rcnn_VOC2007_ZF.m to train my own datasets,and my matlab crashed with the following crash report: fast_rcnn startup done GPU 1: free memory 2066997248 Use GPU 1 imdb (voc_2007_trainval): 9/20 Saving imdb to cache...done Loading region proposals...done Warrning: no windows proposal is loaded ! Saving roidb to cache...done imdb (voc_2007_test): 1/10 Saving imdb to cache...done Loading region proposals...done Warrning: no windows proposal is loaded ! Saving roidb to cache...done Cleared 0 solvers and 1 stand-alone nets
stage one proposal
conf: batch_size: 256 bg_thresh_hi: 0.3000 bg_thresh_lo: 0 bg_weight: 1 drop_boxes_runoff_image: 1 feat_stride: 16 fg_fraction: 0.5000 fg_thresh: 0.7000 image_means: [224x224x3 single] ims_per_batch: 1 max_size: 1000 rng_seed: 6 scales: 600 target_only_gt: 1 test_binary: 0 test_drop_boxes_runoff_image: 0 test_max_size: 1000 test_min_box_size: 16 test_nms: 0.3000 test_scales: 600 use_flipped: 1 use_gpu: 1 anchors: [9x4 double] output_width_map: [901x1 containers.Map] output_height_map: [901x1 containers.Map]
opts: cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn' conf: [1x1 struct] do_val: 1 imdb_train: {[1x1 struct]} imdb_val: [1x1 struct] net_file: 'D:\Faster_RCNN\faster_rcnn-master\models\pre_trained_models\ZF\ZF.caffemodel' roidb_train: {[1x1 struct]} roidb_val: [1x1 struct] snapshot_interval: 10000 solver_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\solver_60k80k.prototxt' val_interval: 2000 val_iters: 1
Preparing training data...Starting parallel pool (parpool) using the 'local' profile ... connected to 2 workers. Done. Preparing validation data...Done. Saved as D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_2007_trainval\iter_2000 Saved as D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_2007_trainval\final Cleared 1 solvers and 0 stand-alone nets opts: cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn' conf: [1x1 struct] imdb: [1x1 struct] net_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\test.prototxt' net_file: 'D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_20...' suffix: ''
conf: batch_size: 256 bg_thresh_hi: 0.3000 bg_thresh_lo: 0 bg_weight: 1 drop_boxes_runoff_image: 1 feat_stride: 16 fg_fraction: 0.5000 fg_thresh: 0.7000 image_means: [224x224x3 single] ims_per_batch: 1 max_size: 1000 rng_seed: 6 scales: 600 target_only_gt: 1 test_binary: 0 test_drop_boxes_runoff_image: 0 test_max_size: 1000 test_min_box_size: 16 test_nms: 0.3000 test_scales: 600 use_flipped: 1 use_gpu: 1 anchors: [9x4 double] output_width_map: [901x1 containers.Map] output_height_map: [901x1 containers.Map]
faster_rcnn-master: test (voc_2007_trainval) 1/20 time: 1.234s faster_rcnn-master: test (voc_2007_trainval) 2/20 time: 0.830s faster_rcnn-master: test (voc_2007_trainval) 3/20 time: 0.672s faster_rcnn-master: test (voc_2007_trainval) 4/20 time: 0.665s faster_rcnn-master: test (voc_2007_trainval) 5/20 time: 0.739s faster_rcnn-master: test (voc_2007_trainval) 6/20 time: 0.740s faster_rcnn-master: test (voc_2007_trainval) 7/20 time: 0.666s faster_rcnn-master: test (voc_2007_trainval) 8/20 time: 0.666s faster_rcnn-master: test (voc_2007_trainval) 9/20 time: 0.756s faster_rcnn-master: test (voc_2007_trainval) 10/20 time: 0.755s faster_rcnn-master: test (voc_2007_trainval) 11/20 time: 0.727s faster_rcnn-master: test (voc_2007_trainval) 12/20 time: 0.725s faster_rcnn-master: test (voc_2007_trainval) 13/20 time: 0.740s faster_rcnn-master: test (voc_2007_trainval) 14/20 time: 0.739s faster_rcnn-master: test (voc_2007_trainval) 15/20 time: 0.724s faster_rcnn-master: test (voc_2007_trainval) 16/20 time: 0.723s faster_rcnn-master: test (voc_2007_trainval) 17/20 time: 0.767s faster_rcnn-master: test (voc_2007_trainval) 18/20 time: 0.767s faster_rcnn-master: test (voc_2007_trainval) 19/20 time: 0.669s faster_rcnn-master: test (voc_2007_trainval) 20/20 time: 0.669s Cleared 0 solvers and 1 stand-alone nets aver_boxes_num = 2731, select top 2000 opts: cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn' conf: [1x1 struct] imdb: [1x1 struct] net_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\test.prototxt' net_file: 'D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_20...' suffix: ''
conf: batch_size: 256 bg_thresh_hi: 0.3000 bg_thresh_lo: 0 bg_weight: 1 drop_boxes_runoff_image: 1 feat_stride: 16 fg_fraction: 0.5000 fg_thresh: 0.7000 image_means: [224x224x3 single] ims_per_batch: 1 max_size: 1000 rng_seed: 6 scales: 600 target_only_gt: 1 test_binary: 0 test_drop_boxes_runoff_image: 0 test_max_size: 1000 test_min_box_size: 16 test_nms: 0.3000 test_scales: 600 use_flipped: 1 use_gpu: 1 anchors: [9x4 double] output_width_map: [901x1 containers.Map] output_height_map: [901x1 containers.Map]
faster_rcnn-master: test (voc_2007_test) 1/10 time: 0.866s faster_rcnn-master: test (voc_2007_test) 2/10 time: 0.961s faster_rcnn-master: test (voc_2007_test) 3/10 time: 0.664s faster_rcnn-master: test (voc_2007_test) 4/10 time: 0.659s faster_rcnn-master: test (voc_2007_test) 5/10 time: 0.753s faster_rcnn-master: test (voc_2007_test) 6/10 time: 0.899s faster_rcnn-master: test (voc_2007_test) 7/10 time: 0.738s faster_rcnn-master: test (voc_2007_test) 8/10 time: 0.750s faster_rcnn-master: test (voc_2007_test) 9/10 time: 0.850s faster_rcnn-master: test (voc_2007_test) 10/10 time: 0.821s Cleared 0 solvers and 1 stand-alone nets aver_boxes_num = 2695, select top 2000
stage one fast rcnn
conf: batch_size: 128 bbox_thresh: 0.5000 bg_thresh_hi: 0.5000 bg_thresh_lo: 0.1000 fg_fraction: 0.2500 fg_thresh: 0.5000 image_means: [224x224x3 single] ims_per_batch: 2 max_size: 1000 rng_seed: 6 scales: 600 test_binary: 0 test_max_size: 1000 test_nms: 0.3000 test_scales: 600 use_flipped: 1 use_gpu: 1
opts: cache_name: 'faster_rcnn_VOC2007_ZF_top-1_nms0_7_top2000_stage1_fast_rcnn' conf: [1x1 struct] do_val: 1 imdb_train: {[1x1 struct]} imdb_val: [1x1 struct] net_file: 'D:\Faster_RCNN\faster_rcnn-master\models\pre_trained_models\ZF\ZF.caffemodel' roidb_train: {[1x1 struct]} roidb_val: [1x1 struct] snapshot_interval: 10000 solver_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\fast_rcnn_prototxts\ZF\solver_30k40k.prototxt' val_interval: 2000 val_iters: 1
Preparing training data...Done. Preparing validation data...Done. 错误使用 caffe_ glog check error, please check log and clear mex
出错 caffe.Solver/step (line 56) caffe_('solver_step', self.hSolver_self, iters);
出错 fast_rcnn_train>check_gpu_memory (line 216) caffe_solver.step(1);
出错 fast_rcnn_train (line 89) check_gpu_memory(conf, caffe_solver, num_classes, opts.do_val);
出错 Faster_RCNN_Train.do_fast_rcnn_train (line 7) model_stage.output_model_file = fast_rcnn_train(conf, dataset.imdb_train, dataset.roidb_train, ...
出错 script_faster_rcnn_VOC2007_ZF (line 64) model.stage1_fast_rcnn = Faster_RCNN_Train.do_fast_rcnn_train(conf_fast_rcnn, dataset, model.stage1_fast_rcnn, opts.do_val);
IdleTimeout has been reached. Parallel pool using the 'local' profile is shutting down.
Thanks for your help!
@BUAAkong You need to show your log, which is in /output
@assess09 I have sent an e-mail to you with an attachment.
@BUAAkong I didn't receive your email. And I'm not sure I can solve your problem even if I check your log file.
@assess09 I made a mistake about the e-mail...Thanks for your attention and help!
@BUAAkong We face the same mistake as yours, have you solved it? Thx!
@xzabg Maybe it's because the GPU's computing capability is too weak.Please read here: https://github.com/ShaoqingRen/faster_rcnn#requirements-software
@BUAAkong So, you change other GPU or GPUs with stronger capability? And the code can run normally?
@xzabg No,I am just going to change it. I heard that from a friend ,and he run the code successfully after updating the GPU.And have you read the web I share you?The code may need at least 3GB GPU memory for ZF net and 8GB GPU memory for VGG-16 net.
@BUAAkong Yes, I saw it. And my configuration now is GTX1060 with cuda 8.0, how about you?
@BUAAkong After you updating your GPU, if it is convenient, would you like to tell me the result, please?
@xzabg OK, but now it seems the work station in our laboratory is to be built after over one month later. And no GPU ,no training. Since I have not ever trained the net completely yet,I am not sure whether the issue really comes from GPU ‘S weak capability or not. Furthermore, I think GTX 1060‘s capability is enough to run faster rcnn(for ZF is enough but for VGG is not).
@BUAAkong Yes, I also think GTX 1060 is enough for training ZF, but from the information from caffe_log, it seems that there's something wrong with the capability of GPU. Part of the caffe_log: I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer conv1 I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer relu1 I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer norm1 I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer pool1 I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer conv2 I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer relu2 I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer norm2 I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer pool2 I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer conv3 I1229 10:09:26.349350 6356 net.cpp:746] Copying source layer relu3 I1229 10:09:26.349350 6356 net.cpp:746] Copying source layer conv4 I1229 10:09:26.350352 6356 net.cpp:746] Copying source layer relu4 I1229 10:09:26.350352 6356 net.cpp:746] Copying source layer conv5 I1229 10:09:26.351356 6356 net.cpp:746] Copying source layer relu5 I1229 10:09:26.351356 6356 net.cpp:743] Ignoring source layer pool5_spm6 I1229 10:09:26.352356 6356 net.cpp:743] Ignoring source layer pool5_spm6_flatten I1229 10:09:26.352356 6356 net.cpp:746] Copying source layer fc6 I1229 10:09:26.388463 6356 net.cpp:746] Copying source layer relu6 I1229 10:09:26.388463 6356 net.cpp:746] Copying source layer drop6 I1229 10:09:26.389463 6356 net.cpp:746] Copying source layer fc7 I1229 10:09:26.405477 6356 net.cpp:746] Copying source layer relu7 I1229 10:09:26.405477 6356 net.cpp:746] Copying source layer drop7 I1229 10:09:26.405477 6356 net.cpp:743] Ignoring source layer fc8 I1229 10:09:26.405477 6356 net.cpp:743] Ignoring source layer prob F1229 10:09:59.980269 6356 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory F1229 10:09:59.980269 6356 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory
@xzabg Sorry, I cannot explain it,either.Something else should be wrong.
@BUAAkong That's fine. I1229 11:17:24.730571 13420 net.cpp:743] Ignoring source layer fc8 I1229 11:17:24.730571 13420 net.cpp:743] Ignoring source layer prob I1229 11:17:57.716583 13420 solver.cpp:214] Iteration 0, loss = 3.04357 I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #0: accuarcy = 0 I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #1: loss_bbox = 0 (* 1 = 0 loss) I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #2: loss_cls = 3.04357 (* 1 = 3.04357 loss) I1229 11:17:57.716583 13420 solver.cpp:486] Iteration 0, lr = 0.001 F1229 11:17:57.719590 13420 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory F1229 11:17:57.719590 13420 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory
It seems that the training code can run, but the memory is not enough and I'll try to change some parameters. Let's keep in touch and maybe we'll find something else.
@xzabg With pleasure.email:[email protected]
@xzabg @BUAAkong Did you guys solve this problem in the end? I met the same as you. Does it work changing the parameters?
Using GTX`1080 ALso occur the error! like the fllowing status.Preparing training data...Done. Preparing validation data...Done. 错误使用 caffe_ glog check error, please check log and clear mex
出错 caffe.Solver/step (line 56) caffe_('solver_step', self.hSolver_self, iters);
出错 fast_rcnn_train>check_gpu_memory (line 216) caffe_solver.step(1);
出错 fast_rcnn_train (line 89) check_gpu_memory(conf, caffe_solver, num_classes, opts.do_val);
出错 Faster_RCNN_Train.do_fast_rcnn_train (line 7) model_stage.output_model_file = fast_rcnn_train(conf, dataset.imdb_train, dataset.roidb_train, ...
出错 script_faster_rcnn_VOC2007_ZF (line 53) model.stage1_fast_rcnn = Faster_RCNN_Train.do_fast_rcnn_train(conf_fast_rcnn, dataset, model.stage1_fast_rcnn, opts.do_val);
@LEXUSAPI 你用的是cuda7.5 还是cuda8.0 ?
@BUAAkong i had solve the problem ,all the wrong is happend in caffe vision !
@LEXUSAPI how solve your problem i have same problem ?? can you explain how change caffe vision?
Did you solve this problem? I have same problem please help
@qwertyDvo What is your GPU version and cuda version ?
@qwertyDvo my email : [email protected]
GPU is GTX 1070 8GB and I use 6.5 cuda for faster rcnn
What shall I send you?
@qwertyDvo Maybe you can update the cuda version to 8.0 and try it again. And the email is for that I cannot always receive your reply without delay.
Ok thank you. Did you solve this problem by using cuda 8.0?
@qwertyDvo Actually I cannot be sure if it is effective, but since I used the combination of gtx 1080 gpu and cuda 8.0 , such issue has never appeared.
Ok thank you I will try
@BUAAkong Once I tried to use cuda 9.1 I got this error: Missing dependent shared libraries: 'cudart64_91.dll' required by nms_gpu_mex.mexw64.