DeepLearnToolbox icon indicating copy to clipboard operation
DeepLearnToolbox copied to clipboard

How CNN parameters depends on input image size?

Open mrgloom opened this issue 9 years ago • 5 comments

I'm trying to modify example test_example_CNN.m to work with my images. I have some pedestrian detection dataset where I have two classes positive - pedestrians and negative - background , images are 128*64 size, when I try to run code without changes error increases(!), but when I tried to resize images to 28x28 it worked.

So my question is how CNN parameters depens on image size?

mrgloom avatar May 09 '15 09:05 mrgloom

Same here. Is there a documentation for configuring the CNN?

mongoose54 avatar Aug 25 '15 14:08 mongoose54

Try smaller learning rate. Usually you try learning rate in powers of 10, i.e. 0.1, 0.01, 0.001 and so on. Pick the first one, that makes your loss to decrease. Choosing good hyperparameters for deep networks is still an art, you can find few rules of thumb in these articles:

  • Y. Bengio, "Practical recommendations for gradient-based training of deep architectures", http://arxiv.org/abs/1206.5533
  • Y. LeCun et al, "Efficient BackProp", http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
  • I. Sutskever, "A Brief Overview of Deep Learning ", http://yyue.blogspot.com/2015/01/a-brief-overview-of-deep-learning.html
  • T. M. Breuel, "The Effects of Hyperparameters on SGD Training of Neural Networks", http://arxiv.org/abs/1508.02788

tambetm avatar Aug 25 '15 15:08 tambetm

Thanks for the information. However I was interested in how to set up the structure of CNN here: https://github.com/rasmusbergpalm/DeepLearnToolbox/blob/master/tests/test_example_CNN.m#L15-L21

mongoose54 avatar Aug 25 '15 19:08 mongoose54

I would start with some well-known architecture. CIFAR-10 examples are good start, if your images are not too big. Otherwise AlexNet, but AlexNet is way too big for DeepLearnToolbox to handle.

For example CIFAR-10 network in Caffe examples has worked well for me: https://github.com/BVLC/caffe/blob/master/examples/cifar10/cifar10_quick_train_test.prototxt Hopefully you can figure out the layer parameters from all this prototxt cruft.

tambetm avatar Aug 26 '15 09:08 tambetm

I found this formula in Andrej Karpathy's CNN course and it worked for me: (it's really simple after a while of thinking)

It assumes square images, vertical stride equals horizontal stride and a square kernel_size!

in_channels = 3 # nearly always, because image has 3 channels (3 matrices -> red, green, blue)
out_channels = (image_width - kernel_size + 2*padding) / stride + 1

# if you don't know what these variables mean, google it -> these are the basics of CNN

in_channels and out_channels are the parameters for one convolution layer, but each following layer's in_channels equals to number of out_channels from the previous one.

micmarty avatar Oct 06 '17 19:10 micmarty