faster_rcnn
faster_rcnn copied to clipboard
Error when rerunning script
When I run script_faster_rcnn_demo
for the first time after starting Matlab, things work fine. But if I re-run the same script after the first run, I get the following error
fast_rcnn startup done
GPU 1: free memory 11945885696
GPU 2: free memory 813449216
Use GPU 1
[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed: generated_database_->Add(encoded_file_descriptor, size):
Caught "std::exception" Exception message is:
CHECK failed: generated_database_->Add(encoded_file_descriptor, size):
I get a similar error. Matlab simply shuts down when re-running the matlab demo. Often a reboot is required to get it to run again.
Yup, reboot is the only way for me to get it working again.
@varun-nagaraja @KapSteR
I can't reproduce this bug on Windows. Ross also hasn't reported this bug on Ubuntu.
In the head for script_faster_rcnn_demo, we clear caffe mex (mexLock() is commented), so there should be any error thrown by caffe in the second calling.
I think we should make sure that the mex is cleared on your machine as expected.
I can reproduce the error in linux. It's low priority since it just affects the demo script and not training or testing. To clarify comments in the thread: a "reboot" of the computer is not required, just a restart of matlab.
So... It seems to my that there is somehow a GPU memory leak. The GPU memory usage grows linearly with every iteration of the main loop, until MATLAB crashes.
Is it wrong to assume that GPU memory usage is relatively constant with each forward pass, after "warm-up" ?
So is it a problem that mex doesn't clean up after itself correctly after all?
- For me on Linux, the free gpu memory before the 2nd run (4205486080) is 1MB less than before the first run (4206583808). That looks like a leek indeed.
- I also get a protobuf issue on the second run (Linux):
fast_rcnn startup done
GPU 1: free memory 4205486080
Use GPU 1
[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed: generated_database_->Add(encoded_file_descriptor, size):
------------------------------------------------------------------------
std::terminate() detected at Mon Oct 12 13:07:26 2015
------------------------------------------------------------------------
Configuration:
Crash Decoding : Disabled
Crash Mode : continue (default)
Current Graphics Driver: Unknown software
Current Visual : None
Default Encoding : UTF-8
GNU C Library : 2.19 stable
Host Name : ip-172-31-21-65
MATLAB Architecture : glnxa64
MATLAB Root : /usr/local/MATLAB/R2015a
MATLAB Version : 8.5.0.197613 (R2015a)
OpenGL : software
Operating System : Linux 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64
Processor ID : x86 Family 6 Model 45 Stepping 7, GenuineIntel
Virtual Machine : Java 1.7.0_60-b19 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
Window System : No active display
Fault Count: 1
...
Stack Trace (captured):
[ 0] 0x00007f53a6b6570e /usr/local/MATLAB/R2015a/bin/glnxa64/libmwfl.so+00988942 _ZN2fl4diag5linux6x86_6412context_base12capture_dataEv+00000030
...
[ 12] 0x00007f52bd507c12 /usr/local/MATLAB/R2015a/bin/glnxa64/libprotobuf.so.8+00433170 _ZN6google8protobuf14DescriptorPool24InternalAddGeneratedFileEPKvi+00000194
[ 13] 0x00007f52bdc6c37c /home/ubuntu/src/faster_rcnn/external/caffe/matlab/+caffe/private/caffe_.mexa64+00443260
...
This bug does not just affect the demo script, but also training and testing on Ubuntu.
When I re-run 'script_faster_rcnn_VOC2007_ZF.m', it happened too.
When I run script_faster_rcnn_demo errors in caffe_log: F1028 15:47:12.852134 2204 syncedmem.cpp:51] Check failed: error == cudaSuccess (4 vs. 0) unspecified launch failure F1028 15:47:12.852134 2204 syncedmem.cpp:51] Check failed: error == cudaSuccess (4 vs. 0) unspecified launch failure
I can reproduce this problem.When I re-running script_faster_rcnn_demo.m, matlab crash:
[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto [libprotobuf FATAL google/protobuf/descriptor.cc:1018] CHECK failed: generated_database_->Add(encoded_file_descriptor, size): Caught "std::exception" Exception message is: CHECK failed: generated_database_->Add(encoded_file_descriptor, size):
https://github.com/BVLC/caffe/issues/1917 is this problem the same as this Issue?
so how could I solve this problem? I don't really understand. thx
how to solve the problem? I met the bug on Ubuntu 14.04 [libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto [libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed: generated_database_->Add(encoded_file_descriptor, size):
I have solved the last issue ...
THE BUG:
Bug on Ubuntu 14.04,
[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database:
caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed:
generated_database_->Add(encoded_file_descriptor, size):
in the first time I am running training or testing phase everything works fine at the first running,
but if the matlab is still on and I am trying to run it once again the bug occurs.
SOLUTION it seems that it related to clear mex issues.
- I comment the clear mex from m file
- in the mex file I commented out the mexLock() function.
It seems to works ok. I would like to know why using the mex clear at all.
I have encountered the same bug, and solved it by re-compiling opencv with out dnn module. I found that caffe, protobuf, opencv-dnn couldn't work together. It seems to be a bug in either protobuf or opencv.
There are two solutions:
- statically link to protobuf (i.e., link to protobuf.a, NOT protobuf.so)
OR
- remove opencv_contrib/modules/cnn, and re-compile opencv
Problem solved: https://github.com/ShaoqingRen/faster_rcnn/issues/112#issuecomment-273279959