MNC icon indicating copy to clipboard operation
MNC copied to clipboard

ResNet runs out of memory in MNC

Open hgaiser opened this issue 8 years ago • 11 comments

Hi guys,

I have been working on modifying the prototxt files to use the ResNet network instead of VGG16. Initially I wanted to try ResNet50, but if that works I was hoping to expand to ResNet101/ResNet152. However, ResNet50 already appears to run out of memory. I made two variants of ResNet50, a 3-stage and a 5-stage variant.

The 3-stage ResNet50 train prototxt: http://ethereon.github.io/netscope/#/gist/c3912c84e77f3da958933d2e76be841f https://gist.github.com/hgaiser/c3912c84e77f3da958933d2e76be841f

The 5-stage ResNet50 train prototxt: http://ethereon.github.io/netscope/#/gist/93d50b8f889a1701c8b93cc91e6ff8a1 https://gist.github.com/hgaiser/93d50b8f889a1701c8b93cc91e6ff8a1

ResNet50 3-stage takes 8.7Gb on my GTX Titan X, ResNet50 5-stage is crashing on start because it requires more than the 12Gb I have available. I tried making all convolutional layers of ResNet fixed (by setting param { lr_mult: 0 }) but it still crashes due to memory. Is there anything I can do to reduce the memory usage? Am I doing something wrong perhaps?

Also something that struck me as weird, ResNet50 3-stage took 8.7Gb during training, but 10Gb during testing. I would assume it needs less memory during testing because it doesn't need to perform backpropagation.. What is the reason for this? When training VGG16 5-stage it takes 5.6Gb and indeed when testing it uses less memory, 3.2Gb.

For clarity, the ResNet50 3-stage network I trained did seem to work, but it simply required a lot of memory. I did run into something where I wasn't sure what to do with it. The roi_interpolate_conv5 layer output shape is 14x14. roi_interpolate_conv5_box processes this output with stride 2, so this reduces the size to 7x7. res5a_branch2a and res5a_branch1 are connected to roi_interpolate_conv5_box and also stride with size 2. This results in an output of 4x4. The pool5 layer a bit further down the network pools with a kernel size of 7x7, but because of the previous layers it receives an input of 4x4 (which causes an error). To fix this, I changed the stride of roi_interpolate_conv5_box to 1, but I am unsure if this is the correct fix. This issue is perhaps related.

I am using cudnn 5 and CUDA 7.5 installed on arch, using an upstream version of Caffe with the changes from caffe-mnc from @Oh233 . I don't think this significantly affected my results, but thought it might be worth mentioning.

@Oh233 , can you help me out with this one? It would be greatly appreciated.

Best regards, Hans

hgaiser avatar Sep 08 '16 17:09 hgaiser

Anyone?

hgaiser avatar Oct 03 '16 21:10 hgaiser

Hi @hgaiser ,

I am basically trying the same thing as you, migrating this program to ResNet. And I am facing the same problem, Res-101 runs out of memory and only Res-50 with 3-stage classifying only 2 classes can be fed into the memory of 12GB Titan X. What's more, I have to rescale the longest side of the input image to 224 so that the program won't crash because of out of GPU memory.

What's different is that my Res-50 network consumes 12G for training but 5G for testing. But my network connection is different from you. For me, I just naively connect the 'res5c' to 'rpn_conv_3x3'. I have seen your network, it's interesting. But I don't know the reason behind because I didn't have any experience in ResNet, like why are you connecting 'res4f' to the RPN input and use 'res5' after warping layers? Would you briefly explain the idea?

weitianhan avatar Oct 05 '16 08:10 weitianhan

Hey @weitianhan , I'd be happy to and I'd also be interested in your results. In the ResNet paper they ran ResNet with Faster-RCNN and claim the following:

We compute the full-image shared conv feature maps using those lay- ers whose strides on the image are no greater than 16 pixels (i.e., conv1, conv2 x, conv3 x, and conv4 x, totally 91 conv layers in ResNet-101; Table 1). We consider these layers as analogous to the 13 conv layers in VGG-16, and by doing so, both ResNet and VGG-16 have conv feature maps of the same total stride (16 pixels). These layers are shared by a region proposal network (RPN, generating 300 proposals) [32] and a Fast R-CNN detection network [7]. RoI pool- ing [7] is performed before conv5 1. On this RoI-pooled feature, all layers of conv5 x and up are adopted for each region, playing the roles of VGG-16’s fc layers.

From that I gathered that every layer up until res4f are analogous to the 13 layers of VGG16 and the rest is applied to each region of the RPN network. In addition, following the prototxt generated by https://github.com/XiaozhiChen/resnet-generator I see similar results (res4f connected to RPN).

What might also be interesting for you is the ResNet50-light 3-stage network I created. Basically I stripped the part of MNC that does classification on bounding boxes, so that only classification based on masks remain. This reduced the memory usage a lot and allowed to train ResNet101-light 3-stage as well. Here is the network (ResNet50) : http://ethereon.github.io/netscope/#/gist/7aca8118337c7e72da818c824f426a6d

(here is the rest: https://github.com/delftrobotics-forks/MNC/tree/master/models/ResNet50/mnc_3stage)

ps. I haven't trained these networks on VOC, but on our own dataset and I get good results with this network. In fact, I am using ResNet50-light 3-stage as the "default" network at the moment.

hgaiser avatar Oct 05 '16 08:10 hgaiser

Hi @hgaiser , thanks for your reply and contribution. I will look into it and have a try. By the way, are you feeding VOC dataset or COCO dataset?

weitianhan avatar Oct 05 '16 08:10 weitianhan

Feeding as in training on? I am training on a custom dataset, I haven't tried these networks on VOC or COCO.

hgaiser avatar Oct 05 '16 08:10 hgaiser

OK, if the result is normal, it proves that the network makes sense. I will try it on COCO dataset.

weitianhan avatar Oct 05 '16 08:10 weitianhan

If you could report results when you have them, I'd be interested to see how well it compares with the default (VGG16) :)

hgaiser avatar Oct 05 '16 08:10 hgaiser

Oh I didn't try the default setting. But after some modification I get 22% mAP in segmentation task using VGG16 which they report 19.5% in paper.

weitianhan avatar Oct 05 '16 09:10 weitianhan

@weitianhan can you share your configuration? thanks.

zimenglan-sysu-512 avatar Oct 18 '16 07:10 zimenglan-sysu-512

@zimenglan-sysu-512 What kind of configuration are you referring to?

weitianhan avatar Oct 18 '16 10:10 weitianhan

@hgaiser hi~ I am now following this work and I also use resnet50 to replace the VGG16. I wonder the 5stage is trainable now? will it came to the out of memory error? My device is GTX Titanx. thank you very much in advance!

qinhaifangpku avatar Feb 08 '17 09:02 qinhaifangpku