faster_rcnn icon indicating copy to clipboard operation
faster_rcnn copied to clipboard

Error when rerunning script

Open varun-nagaraja opened this issue 9 years ago • 15 comments

When I run script_faster_rcnn_demo for the first time after starting Matlab, things work fine. But if I re-run the same script after the first run, I get the following error

fast_rcnn startup done
GPU 1: free memory 11945885696
GPU 2: free memory 813449216
Use GPU 1
[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed: generated_database_->Add(encoded_file_descriptor, size):
Caught "std::exception" Exception message is:
CHECK failed: generated_database_->Add(encoded_file_descriptor, size):

varun-nagaraja avatar Sep 22 '15 23:09 varun-nagaraja

I get a similar error. Matlab simply shuts down when re-running the matlab demo. Often a reboot is required to get it to run again.

KapSteR avatar Sep 22 '15 23:09 KapSteR

Yup, reboot is the only way for me to get it working again.

varun-nagaraja avatar Sep 23 '15 03:09 varun-nagaraja

@varun-nagaraja @KapSteR

I can't reproduce this bug on Windows. Ross also hasn't reported this bug on Ubuntu.

In the head for script_faster_rcnn_demo, we clear caffe mex (mexLock() is commented), so there should be any error thrown by caffe in the second calling.

I think we should make sure that the mex is cleared on your machine as expected.

ShaoqingRen avatar Sep 23 '15 07:09 ShaoqingRen

I can reproduce the error in linux. It's low priority since it just affects the demo script and not training or testing. To clarify comments in the thread: a "reboot" of the computer is not required, just a restart of matlab.

rbgirshick avatar Sep 23 '15 13:09 rbgirshick

So... It seems to my that there is somehow a GPU memory leak. The GPU memory usage grows linearly with every iteration of the main loop, until MATLAB crashes.

Is it wrong to assume that GPU memory usage is relatively constant with each forward pass, after "warm-up" ?

KapSteR avatar Oct 06 '15 15:10 KapSteR

So is it a problem that mex doesn't clean up after itself correctly after all?

  1. For me on Linux, the free gpu memory before the 2nd run (4205486080) is 1MB less than before the first run (4206583808). That looks like a leek indeed.
  2. I also get a protobuf issue on the second run (Linux):
fast_rcnn startup done
GPU 1: free memory 4205486080
Use GPU 1

[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed: generated_database_->Add(encoded_file_descriptor, size): 

------------------------------------------------------------------------
          std::terminate() detected at Mon Oct 12 13:07:26 2015
------------------------------------------------------------------------

Configuration:
  Crash Decoding      : Disabled
  Crash Mode          : continue (default)
  Current Graphics Driver: Unknown software 
  Current Visual      : None
  Default Encoding    : UTF-8
  GNU C Library       : 2.19 stable
  Host Name           : ip-172-31-21-65
  MATLAB Architecture : glnxa64
  MATLAB Root         : /usr/local/MATLAB/R2015a
  MATLAB Version      : 8.5.0.197613 (R2015a)
  OpenGL              : software
  Operating System    : Linux 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64
  Processor ID        : x86 Family 6 Model 45 Stepping 7, GenuineIntel
  Virtual Machine     : Java 1.7.0_60-b19 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
  Window System       : No active display

Fault Count: 1

...
Stack Trace (captured):
[  0] 0x00007f53a6b6570e    /usr/local/MATLAB/R2015a/bin/glnxa64/libmwfl.so+00988942 _ZN2fl4diag5linux6x86_6412context_base12capture_dataEv+00000030
...
[ 12] 0x00007f52bd507c12 /usr/local/MATLAB/R2015a/bin/glnxa64/libprotobuf.so.8+00433170 _ZN6google8protobuf14DescriptorPool24InternalAddGeneratedFileEPKvi+00000194
[ 13] 0x00007f52bdc6c37c /home/ubuntu/src/faster_rcnn/external/caffe/matlab/+caffe/private/caffe_.mexa64+00443260
...

kukuruza avatar Oct 12 '15 13:10 kukuruza

This bug does not just affect the demo script, but also training and testing on Ubuntu.

When I re-run 'script_faster_rcnn_VOC2007_ZF.m', it happened too.

BlueCrow1991 avatar Oct 17 '15 14:10 BlueCrow1991

When I run script_faster_rcnn_demo errors in caffe_log: F1028 15:47:12.852134 2204 syncedmem.cpp:51] Check failed: error == cudaSuccess (4 vs. 0) unspecified launch failure F1028 15:47:12.852134 2204 syncedmem.cpp:51] Check failed: error == cudaSuccess (4 vs. 0) unspecified launch failure

YingjieYin avatar Oct 28 '15 07:10 YingjieYin

I can reproduce this problem.When I re-running script_faster_rcnn_demo.m, matlab crash:

[libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto [libprotobuf FATAL google/protobuf/descriptor.cc:1018] CHECK failed: generated_database_->Add(encoded_file_descriptor, size): Caught "std::exception" Exception message is: CHECK failed: generated_database_->Add(encoded_file_descriptor, size):

fengyuxi55 avatar Dec 11 '15 08:12 fengyuxi55

https://github.com/BVLC/caffe/issues/1917 is this problem the same as this Issue?

corganhejijun avatar Dec 29 '15 16:12 corganhejijun

so how could I solve this problem? I don't really understand. thx

roytseng-tw avatar Dec 30 '15 00:12 roytseng-tw

how to solve the problem? I met the bug on Ubuntu 14.04 [libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database: caffe.proto [libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed: generated_database_->Add(encoded_file_descriptor, size):

gjyin avatar Jul 18 '16 07:07 gjyin

I have solved the last issue ... THE BUG: Bug on Ubuntu 14.04, [libprotobuf ERROR google/protobuf/descriptor_database.cc:57] File already exists in database:
caffe.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:954] CHECK failed:
generated_database_->Add(encoded_file_descriptor, size): in the first time I am running training or testing phase everything works fine at the first running, but if the matlab is still on and I am trying to run it once again the bug occurs.

SOLUTION it seems that it related to clear mex issues.

  • I comment the clear mex from m file
  • in the mex file I commented out the mexLock() function.

It seems to works ok. I would like to know why using the mex clear at all.

esason avatar Sep 01 '16 17:09 esason

I have encountered the same bug, and solved it by re-compiling opencv with out dnn module. I found that caffe, protobuf, opencv-dnn couldn't work together. It seems to be a bug in either protobuf or opencv.

There are two solutions:

  1. statically link to protobuf (i.e., link to protobuf.a, NOT protobuf.so)

OR

  1. remove opencv_contrib/modules/cnn, and re-compile opencv

ZiangYan avatar Sep 05 '16 08:09 ZiangYan

Problem solved: https://github.com/ShaoqingRen/faster_rcnn/issues/112#issuecomment-273279959

hongkaiyu2012 avatar Jan 17 '17 22:01 hongkaiyu2012