train-CRF-RNN icon indicating copy to clipboard operation
train-CRF-RNN copied to clipboard

How to reduce the graphics memory occupancy

Open wuhang opened this issue 8 years ago • 8 comments

Hi all My graphics diver is gtx660-2gb, So when I run your example is Wrong Check failed: error == cudaSuccess (2 vs. 0) out of memory I have reduced the size of the image to the training data 256PX,Does not work,Can you tell me how to reduce the graphics memory occupation? Thanks

wuhang avatar Apr 08 '16 11:04 wuhang

Hi wuhang,

2 GB aren't really enough to store whole network, therefore you should keep decreasing dimensions of images but I am not sure how much it would have to be resized. To my best knowledge there isn't any other easy way how to exploit network (CRF-RNN) if you don't have enough memory on your GPU.

Cheers,

Martin

martinkersner avatar Apr 19 '16 02:04 martinkersner

@martinkersner thanks,I think I need a Titan X

wuhang avatar May 11 '16 10:05 wuhang

I got similar error with 4GB GPU, and wonder if it is due to my Caffe installation. I have Caffe bittnt version, Cuda v2, when I turned on GPU, got the following error while CPU mode ran without any problem.

[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 537968303 I0612 03:49:53.658771 66 upgrade_proto.cpp:620] Attempting to upgrade input file specified using deprecated V1LayerParameter: TVG_CRFRNN_COCO_VOC.caffemodel I0612 03:49:54.213007 66 upgrade_proto.cpp:628] Successfully upgraded file specified using deprecated V1LayerParameter F0612 03:49:54.858664 66 syncedmem.cpp:58] Check failed: error == cudaSuccess (2 vs. 0) out of memory *** Check failure stack trace: *** Aborted

Any help please?

ghost avatar Jun 12 '16 03:06 ghost

Hi @baoqiangcao,

4GB GPU isn't sufficient.

Cheers,

Martin

martinkersner avatar Jun 13 '16 09:06 martinkersner

Hi @martinkersner @baoqiangcao guys, I also get the same problem. My GPU is GeForce GTX Titan Black 6Gb, it seems to be not sufficient as well.

anguszxd avatar Jun 21 '16 02:06 anguszxd

i got an error when i run "python solve.py 2>&1 | tee train.log" [libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 5:9: Expected string. F0908 09:40:32.894371 23828 upgrade_proto.cpp:932] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: TVG_CRFRNN_COCO_VOC_TRAIN_3_CLASSES.prototxt why? i just made my caffe last week ,the protobuf should be the newest. thanks for any help!

XiangChen1994 avatar Sep 08 '16 01:09 XiangChen1994

Hi @windforever118,

You use prototxt file that was made for old version of CRF as RNN and because you didn't post here whole error output I cannot direct you to exact answer. However, you are not the first who is dealing with this problem, so please take a look at other issues and you will find solution for your problem.

Cheers,

Martin

martinkersner avatar Sep 08 '16 07:09 martinkersner

I got same error. I use Titan X. The DB is made from PASCAL VOC 2012 and it is composed of 2630 train images and 283 test images with 20 classes.

I used batch_size as 1 and made solver.prototxt with test_iter = 283, test_interval = 1333 or 2630 or 1000. But, the train process always break at 1 * test_interval time with Check failed: error == cudaSuccess (2 vs. 0) out of memory. How i set test_iter and test_interval?? thank you for any answer!

Danielll2 avatar Nov 04 '16 18:11 Danielll2