faster_rcnn icon indicating copy to clipboard operation
faster_rcnn copied to clipboard

Matlab system error when running script_faster_rcnn_VOC2007_ZF.m

Open BUAAkong opened this issue 8 years ago • 33 comments

I tried to run script_faster_rcnn_VOC2007_ZF.m to train my own datasets,and my matlab crashed with the following crash report: fast_rcnn startup done GPU 1: free memory 2066997248 Use GPU 1 imdb (voc_2007_trainval): 9/20 Saving imdb to cache...done Loading region proposals...done Warrning: no windows proposal is loaded ! Saving roidb to cache...done imdb (voc_2007_test): 1/10 Saving imdb to cache...done Loading region proposals...done Warrning: no windows proposal is loaded ! Saving roidb to cache...done Cleared 0 solvers and 1 stand-alone nets


stage one proposal


conf: batch_size: 256 bg_thresh_hi: 0.3000 bg_thresh_lo: 0 bg_weight: 1 drop_boxes_runoff_image: 1 feat_stride: 16 fg_fraction: 0.5000 fg_thresh: 0.7000 image_means: [224x224x3 single] ims_per_batch: 1 max_size: 1000 rng_seed: 6 scales: 600 target_only_gt: 1 test_binary: 0 test_drop_boxes_runoff_image: 0 test_max_size: 1000 test_min_box_size: 16 test_nms: 0.3000 test_scales: 600 use_flipped: 1 use_gpu: 1 anchors: [9x4 double] output_width_map: [901x1 containers.Map] output_height_map: [901x1 containers.Map]

opts: cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn' conf: [1x1 struct] do_val: 1 imdb_train: {[1x1 struct]} imdb_val: [1x1 struct] net_file: 'D:\Faster_RCNN\faster_rcnn-master\models\pre_trained_models\ZF\ZF.caffemodel' roidb_train: {[1x1 struct]} roidb_val: [1x1 struct] snapshot_interval: 10000 solver_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\solver_60k80k.prototxt' val_interval: 2000 val_iters: 1

Preparing training data...Starting parallel pool (parpool) using the 'local' profile ... connected to 2 workers. Done. Preparing validation data...Done. Saved as D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_2007_trainval\iter_2000 Saved as D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_2007_trainval\final Cleared 1 solvers and 0 stand-alone nets opts: cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn' conf: [1x1 struct] imdb: [1x1 struct] net_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\test.prototxt' net_file: 'D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_20...' suffix: ''

conf: batch_size: 256 bg_thresh_hi: 0.3000 bg_thresh_lo: 0 bg_weight: 1 drop_boxes_runoff_image: 1 feat_stride: 16 fg_fraction: 0.5000 fg_thresh: 0.7000 image_means: [224x224x3 single] ims_per_batch: 1 max_size: 1000 rng_seed: 6 scales: 600 target_only_gt: 1 test_binary: 0 test_drop_boxes_runoff_image: 0 test_max_size: 1000 test_min_box_size: 16 test_nms: 0.3000 test_scales: 600 use_flipped: 1 use_gpu: 1 anchors: [9x4 double] output_width_map: [901x1 containers.Map] output_height_map: [901x1 containers.Map]

faster_rcnn-master: test (voc_2007_trainval) 1/20 time: 1.234s faster_rcnn-master: test (voc_2007_trainval) 2/20 time: 0.830s faster_rcnn-master: test (voc_2007_trainval) 3/20 time: 0.672s faster_rcnn-master: test (voc_2007_trainval) 4/20 time: 0.665s faster_rcnn-master: test (voc_2007_trainval) 5/20 time: 0.739s faster_rcnn-master: test (voc_2007_trainval) 6/20 time: 0.740s faster_rcnn-master: test (voc_2007_trainval) 7/20 time: 0.666s faster_rcnn-master: test (voc_2007_trainval) 8/20 time: 0.666s faster_rcnn-master: test (voc_2007_trainval) 9/20 time: 0.756s faster_rcnn-master: test (voc_2007_trainval) 10/20 time: 0.755s faster_rcnn-master: test (voc_2007_trainval) 11/20 time: 0.727s faster_rcnn-master: test (voc_2007_trainval) 12/20 time: 0.725s faster_rcnn-master: test (voc_2007_trainval) 13/20 time: 0.740s faster_rcnn-master: test (voc_2007_trainval) 14/20 time: 0.739s faster_rcnn-master: test (voc_2007_trainval) 15/20 time: 0.724s faster_rcnn-master: test (voc_2007_trainval) 16/20 time: 0.723s faster_rcnn-master: test (voc_2007_trainval) 17/20 time: 0.767s faster_rcnn-master: test (voc_2007_trainval) 18/20 time: 0.767s faster_rcnn-master: test (voc_2007_trainval) 19/20 time: 0.669s faster_rcnn-master: test (voc_2007_trainval) 20/20 time: 0.669s Cleared 0 solvers and 1 stand-alone nets aver_boxes_num = 2731, select top 2000 opts: cache_name: 'faster_rcnn_VOC2007_ZF_stage1_rpn' conf: [1x1 struct] imdb: [1x1 struct] net_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\rpn_prototxts\ZF\test.prototxt' net_file: 'D:\Faster_RCNN\faster_rcnn-master\output\rpn_cachedir\faster_rcnn_VOC2007_ZF_stage1_rpn\voc_20...' suffix: ''

conf: batch_size: 256 bg_thresh_hi: 0.3000 bg_thresh_lo: 0 bg_weight: 1 drop_boxes_runoff_image: 1 feat_stride: 16 fg_fraction: 0.5000 fg_thresh: 0.7000 image_means: [224x224x3 single] ims_per_batch: 1 max_size: 1000 rng_seed: 6 scales: 600 target_only_gt: 1 test_binary: 0 test_drop_boxes_runoff_image: 0 test_max_size: 1000 test_min_box_size: 16 test_nms: 0.3000 test_scales: 600 use_flipped: 1 use_gpu: 1 anchors: [9x4 double] output_width_map: [901x1 containers.Map] output_height_map: [901x1 containers.Map]

faster_rcnn-master: test (voc_2007_test) 1/10 time: 0.866s faster_rcnn-master: test (voc_2007_test) 2/10 time: 0.961s faster_rcnn-master: test (voc_2007_test) 3/10 time: 0.664s faster_rcnn-master: test (voc_2007_test) 4/10 time: 0.659s faster_rcnn-master: test (voc_2007_test) 5/10 time: 0.753s faster_rcnn-master: test (voc_2007_test) 6/10 time: 0.899s faster_rcnn-master: test (voc_2007_test) 7/10 time: 0.738s faster_rcnn-master: test (voc_2007_test) 8/10 time: 0.750s faster_rcnn-master: test (voc_2007_test) 9/10 time: 0.850s faster_rcnn-master: test (voc_2007_test) 10/10 time: 0.821s Cleared 0 solvers and 1 stand-alone nets aver_boxes_num = 2695, select top 2000


stage one fast rcnn


conf: batch_size: 128 bbox_thresh: 0.5000 bg_thresh_hi: 0.5000 bg_thresh_lo: 0.1000 fg_fraction: 0.2500 fg_thresh: 0.5000 image_means: [224x224x3 single] ims_per_batch: 2 max_size: 1000 rng_seed: 6 scales: 600 test_binary: 0 test_max_size: 1000 test_nms: 0.3000 test_scales: 600 use_flipped: 1 use_gpu: 1

opts: cache_name: 'faster_rcnn_VOC2007_ZF_top-1_nms0_7_top2000_stage1_fast_rcnn' conf: [1x1 struct] do_val: 1 imdb_train: {[1x1 struct]} imdb_val: [1x1 struct] net_file: 'D:\Faster_RCNN\faster_rcnn-master\models\pre_trained_models\ZF\ZF.caffemodel' roidb_train: {[1x1 struct]} roidb_val: [1x1 struct] snapshot_interval: 10000 solver_def_file: 'D:\Faster_RCNN\faster_rcnn-master\models\fast_rcnn_prototxts\ZF\solver_30k40k.prototxt' val_interval: 2000 val_iters: 1

Preparing training data...Done. Preparing validation data...Done. 错误使用 caffe_ glog check error, please check log and clear mex

出错 caffe.Solver/step (line 56) caffe_('solver_step', self.hSolver_self, iters);

出错 fast_rcnn_train>check_gpu_memory (line 216) caffe_solver.step(1);

出错 fast_rcnn_train (line 89) check_gpu_memory(conf, caffe_solver, num_classes, opts.do_val);

出错 Faster_RCNN_Train.do_fast_rcnn_train (line 7) model_stage.output_model_file = fast_rcnn_train(conf, dataset.imdb_train, dataset.roidb_train, ...

出错 script_faster_rcnn_VOC2007_ZF (line 64) model.stage1_fast_rcnn = Faster_RCNN_Train.do_fast_rcnn_train(conf_fast_rcnn, dataset, model.stage1_fast_rcnn, opts.do_val);

IdleTimeout has been reached. Parallel pool using the 'local' profile is shutting down.

Thanks for your help!

BUAAkong avatar Dec 07 '16 07:12 BUAAkong

@BUAAkong You need to show your log, which is in /output

oneQuery avatar Dec 08 '16 15:12 oneQuery

@assess09 I have sent an e-mail to you with an attachment.

BUAAkong avatar Dec 09 '16 14:12 BUAAkong

@BUAAkong I didn't receive your email. And I'm not sure I can solve your problem even if I check your log file.

oneQuery avatar Dec 10 '16 21:12 oneQuery

@assess09 I made a mistake about the e-mail...Thanks for your attention and help!

BUAAkong avatar Dec 11 '16 00:12 BUAAkong

@BUAAkong We face the same mistake as yours, have you solved it? Thx!

xzabg avatar Dec 27 '16 11:12 xzabg

@xzabg Maybe it's because the GPU's computing capability is too weak.Please read here: https://github.com/ShaoqingRen/faster_rcnn#requirements-software

BUAAkong avatar Dec 27 '16 11:12 BUAAkong

@BUAAkong So, you change other GPU or GPUs with stronger capability? And the code can run normally?

xzabg avatar Dec 28 '16 02:12 xzabg

@xzabg No,I am just going to change it. I heard that from a friend ,and he run the code successfully after updating the GPU.And have you read the web I share you?The code may need at least 3GB GPU memory for ZF net and 8GB GPU memory for VGG-16 net.

BUAAkong avatar Dec 28 '16 03:12 BUAAkong

@BUAAkong Yes, I saw it. And my configuration now is GTX1060 with cuda 8.0, how about you?

xzabg avatar Dec 28 '16 06:12 xzabg

@BUAAkong After you updating your GPU, if it is convenient, would you like to tell me the result, please?

xzabg avatar Dec 28 '16 07:12 xzabg

@xzabg OK, but now it seems the work station in our laboratory is to be built after over one month later. And no GPU ,no training. Since I have not ever trained the net completely yet,I am not sure whether the issue really comes from GPU ‘S weak capability or not. Furthermore, I think GTX 1060‘s capability is enough to run faster rcnn(for ZF is enough but for VGG is not).

BUAAkong avatar Dec 28 '16 08:12 BUAAkong

@BUAAkong Yes, I also think GTX 1060 is enough for training ZF, but from the information from caffe_log, it seems that there's something wrong with the capability of GPU. Part of the caffe_log: I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer conv1 I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer relu1 I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer norm1 I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer pool1 I1229 10:09:26.347323 6356 net.cpp:746] Copying source layer conv2 I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer relu2 I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer norm2 I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer pool2 I1229 10:09:26.348325 6356 net.cpp:746] Copying source layer conv3 I1229 10:09:26.349350 6356 net.cpp:746] Copying source layer relu3 I1229 10:09:26.349350 6356 net.cpp:746] Copying source layer conv4 I1229 10:09:26.350352 6356 net.cpp:746] Copying source layer relu4 I1229 10:09:26.350352 6356 net.cpp:746] Copying source layer conv5 I1229 10:09:26.351356 6356 net.cpp:746] Copying source layer relu5 I1229 10:09:26.351356 6356 net.cpp:743] Ignoring source layer pool5_spm6 I1229 10:09:26.352356 6356 net.cpp:743] Ignoring source layer pool5_spm6_flatten I1229 10:09:26.352356 6356 net.cpp:746] Copying source layer fc6 I1229 10:09:26.388463 6356 net.cpp:746] Copying source layer relu6 I1229 10:09:26.388463 6356 net.cpp:746] Copying source layer drop6 I1229 10:09:26.389463 6356 net.cpp:746] Copying source layer fc7 I1229 10:09:26.405477 6356 net.cpp:746] Copying source layer relu7 I1229 10:09:26.405477 6356 net.cpp:746] Copying source layer drop7 I1229 10:09:26.405477 6356 net.cpp:743] Ignoring source layer fc8 I1229 10:09:26.405477 6356 net.cpp:743] Ignoring source layer prob F1229 10:09:59.980269 6356 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory F1229 10:09:59.980269 6356 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory

xzabg avatar Dec 29 '16 02:12 xzabg

@xzabg Sorry, I cannot explain it,either.Something else should be wrong.

BUAAkong avatar Dec 29 '16 02:12 BUAAkong

@BUAAkong That's fine. I1229 11:17:24.730571 13420 net.cpp:743] Ignoring source layer fc8 I1229 11:17:24.730571 13420 net.cpp:743] Ignoring source layer prob I1229 11:17:57.716583 13420 solver.cpp:214] Iteration 0, loss = 3.04357 I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #0: accuarcy = 0 I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #1: loss_bbox = 0 (* 1 = 0 loss) I1229 11:17:57.716583 13420 solver.cpp:229] Train net output #2: loss_cls = 3.04357 (* 1 = 3.04357 loss) I1229 11:17:57.716583 13420 solver.cpp:486] Iteration 0, lr = 0.001 F1229 11:17:57.719590 13420 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory F1229 11:17:57.719590 13420 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory

It seems that the training code can run, but the memory is not enough and I'll try to change some parameters. Let's keep in touch and maybe we'll find something else.

xzabg avatar Dec 29 '16 03:12 xzabg

@xzabg With pleasure.email:[email protected]

BUAAkong avatar Dec 29 '16 03:12 BUAAkong

@xzabg @BUAAkong Did you guys solve this problem in the end? I met the same as you. Does it work changing the parameters?

YilunYang avatar Feb 17 '17 00:02 YilunYang

Using GTX`1080 ALso occur the error! like the fllowing status.Preparing training data...Done. Preparing validation data...Done. 错误使用 caffe_ glog check error, please check log and clear mex

出错 caffe.Solver/step (line 56) caffe_('solver_step', self.hSolver_self, iters);

出错 fast_rcnn_train>check_gpu_memory (line 216) caffe_solver.step(1);

出错 fast_rcnn_train (line 89) check_gpu_memory(conf, caffe_solver, num_classes, opts.do_val);

出错 Faster_RCNN_Train.do_fast_rcnn_train (line 7) model_stage.output_model_file = fast_rcnn_train(conf, dataset.imdb_train, dataset.roidb_train, ...

出错 script_faster_rcnn_VOC2007_ZF (line 53) model.stage1_fast_rcnn = Faster_RCNN_Train.do_fast_rcnn_train(conf_fast_rcnn, dataset, model.stage1_fast_rcnn, opts.do_val);

LEXUSAPI avatar Apr 14 '17 07:04 LEXUSAPI

@LEXUSAPI 你用的是cuda7.5 还是cuda8.0 ?

BUAAkong avatar Apr 14 '17 12:04 BUAAkong

@BUAAkong i had solve the problem ,all the wrong is happend in caffe vision !

LEXUSAPI avatar Apr 27 '17 02:04 LEXUSAPI

@LEXUSAPI how solve your problem i have same problem ?? can you explain how change caffe vision?

ggghh avatar Jul 02 '18 15:07 ggghh

Did you solve this problem? I have same problem please help

qwertyDvo avatar Oct 13 '18 02:10 qwertyDvo

@qwertyDvo What is your GPU version and cuda version ?

BUAAkong avatar Oct 13 '18 02:10 BUAAkong

@qwertyDvo my email : [email protected]

BUAAkong avatar Oct 13 '18 02:10 BUAAkong

GPU is GTX 1070 8GB and I use 6.5 cuda for faster rcnn

qwertyDvo avatar Oct 13 '18 02:10 qwertyDvo

What shall I send you?

qwertyDvo avatar Oct 13 '18 02:10 qwertyDvo

@qwertyDvo Maybe you can update the cuda version to 8.0 and try it again. And the email is for that I cannot always receive your reply without delay.

BUAAkong avatar Oct 13 '18 03:10 BUAAkong

Ok thank you. Did you solve this problem by using cuda 8.0?

qwertyDvo avatar Oct 13 '18 03:10 qwertyDvo

@qwertyDvo Actually I cannot be sure if it is effective, but since I used the combination of gtx 1080 gpu and cuda 8.0 , such issue has never appeared.

BUAAkong avatar Oct 13 '18 03:10 BUAAkong

Ok thank you I will try

qwertyDvo avatar Oct 13 '18 03:10 qwertyDvo

@BUAAkong Once I tried to use cuda 9.1 I got this error: Missing dependent shared libraries: 'cudart64_91.dll' required by nms_gpu_mex.mexw64.

qwertyDvo avatar Oct 13 '18 13:10 qwertyDvo