MNC
MNC copied to clipboard
ResNet runs out of memory in MNC
Hi guys,
I have been working on modifying the prototxt files to use the ResNet network instead of VGG16. Initially I wanted to try ResNet50, but if that works I was hoping to expand to ResNet101/ResNet152. However, ResNet50 already appears to run out of memory. I made two variants of ResNet50, a 3-stage and a 5-stage variant.
The 3-stage ResNet50 train prototxt: http://ethereon.github.io/netscope/#/gist/c3912c84e77f3da958933d2e76be841f https://gist.github.com/hgaiser/c3912c84e77f3da958933d2e76be841f
The 5-stage ResNet50 train prototxt: http://ethereon.github.io/netscope/#/gist/93d50b8f889a1701c8b93cc91e6ff8a1 https://gist.github.com/hgaiser/93d50b8f889a1701c8b93cc91e6ff8a1
ResNet50 3-stage takes 8.7Gb on my GTX Titan X, ResNet50 5-stage is crashing on start because it requires more than the 12Gb I have available. I tried making all convolutional layers of ResNet fixed (by setting param { lr_mult: 0 }
) but it still crashes due to memory. Is there anything I can do to reduce the memory usage? Am I doing something wrong perhaps?
Also something that struck me as weird, ResNet50 3-stage took 8.7Gb during training, but 10Gb during testing. I would assume it needs less memory during testing because it doesn't need to perform backpropagation.. What is the reason for this? When training VGG16 5-stage it takes 5.6Gb and indeed when testing it uses less memory, 3.2Gb.
For clarity, the ResNet50 3-stage network I trained did seem to work, but it simply required a lot of memory. I did run into something where I wasn't sure what to do with it. The roi_interpolate_conv5
layer output shape is 14x14. roi_interpolate_conv5_box
processes this output with stride 2, so this reduces the size to 7x7. res5a_branch2a
and res5a_branch1
are connected to roi_interpolate_conv5_box
and also stride with size 2. This results in an output of 4x4. The pool5
layer a bit further down the network pools with a kernel size of 7x7, but because of the previous layers it receives an input of 4x4 (which causes an error). To fix this, I changed the stride of roi_interpolate_conv5_box
to 1, but I am unsure if this is the correct fix. This issue is perhaps related.
I am using cudnn 5 and CUDA 7.5 installed on arch, using an upstream version of Caffe with the changes from caffe-mnc from @Oh233 . I don't think this significantly affected my results, but thought it might be worth mentioning.
@Oh233 , can you help me out with this one? It would be greatly appreciated.
Best regards, Hans
Anyone?
Hi @hgaiser ,
I am basically trying the same thing as you, migrating this program to ResNet. And I am facing the same problem, Res-101 runs out of memory and only Res-50 with 3-stage classifying only 2 classes can be fed into the memory of 12GB Titan X. What's more, I have to rescale the longest side of the input image to 224 so that the program won't crash because of out of GPU memory.
What's different is that my Res-50 network consumes 12G for training but 5G for testing. But my network connection is different from you. For me, I just naively connect the 'res5c' to 'rpn_conv_3x3'. I have seen your network, it's interesting. But I don't know the reason behind because I didn't have any experience in ResNet, like why are you connecting 'res4f' to the RPN input and use 'res5' after warping layers? Would you briefly explain the idea?
Hey @weitianhan , I'd be happy to and I'd also be interested in your results. In the ResNet paper they ran ResNet with Faster-RCNN and claim the following:
We compute the full-image shared conv feature maps using those lay- ers whose strides on the image are no greater than 16 pixels (i.e., conv1, conv2 x, conv3 x, and conv4 x, totally 91 conv layers in ResNet-101; Table 1). We consider these layers as analogous to the 13 conv layers in VGG-16, and by doing so, both ResNet and VGG-16 have conv feature maps of the same total stride (16 pixels). These layers are shared by a region proposal network (RPN, generating 300 proposals) [32] and a Fast R-CNN detection network [7]. RoI pool- ing [7] is performed before conv5 1. On this RoI-pooled feature, all layers of conv5 x and up are adopted for each region, playing the roles of VGG-16’s fc layers.
From that I gathered that every layer up until res4f are analogous to the 13 layers of VGG16 and the rest is applied to each region of the RPN network. In addition, following the prototxt generated by https://github.com/XiaozhiChen/resnet-generator I see similar results (res4f connected to RPN).
What might also be interesting for you is the ResNet50-light 3-stage network I created. Basically I stripped the part of MNC that does classification on bounding boxes, so that only classification based on masks remain. This reduced the memory usage a lot and allowed to train ResNet101-light 3-stage as well. Here is the network (ResNet50) : http://ethereon.github.io/netscope/#/gist/7aca8118337c7e72da818c824f426a6d
(here is the rest: https://github.com/delftrobotics-forks/MNC/tree/master/models/ResNet50/mnc_3stage)
ps. I haven't trained these networks on VOC, but on our own dataset and I get good results with this network. In fact, I am using ResNet50-light 3-stage as the "default" network at the moment.
Hi @hgaiser , thanks for your reply and contribution. I will look into it and have a try. By the way, are you feeding VOC dataset or COCO dataset?
Feeding as in training on? I am training on a custom dataset, I haven't tried these networks on VOC or COCO.
OK, if the result is normal, it proves that the network makes sense. I will try it on COCO dataset.
If you could report results when you have them, I'd be interested to see how well it compares with the default (VGG16) :)
Oh I didn't try the default setting. But after some modification I get 22% mAP in segmentation task using VGG16 which they report 19.5% in paper.
@weitianhan can you share your configuration? thanks.
@zimenglan-sysu-512 What kind of configuration are you referring to?
@hgaiser hi~ I am now following this work and I also use resnet50 to replace the VGG16. I wonder the 5stage is trainable now? will it came to the out of memory error? My device is GTX Titanx. thank you very much in advance!