DeepLearnToolbox
DeepLearnToolbox copied to clipboard
How CNN parameters depends on input image size?
I'm trying to modify example test_example_CNN.m to work with my images. I have some pedestrian detection dataset where I have two classes positive - pedestrians and negative - background , images are 128*64 size, when I try to run code without changes error increases(!), but when I tried to resize images to 28x28 it worked.
So my question is how CNN parameters depens on image size?
Same here. Is there a documentation for configuring the CNN?
Try smaller learning rate. Usually you try learning rate in powers of 10, i.e. 0.1, 0.01, 0.001 and so on. Pick the first one, that makes your loss to decrease. Choosing good hyperparameters for deep networks is still an art, you can find few rules of thumb in these articles:
- Y. Bengio, "Practical recommendations for gradient-based training of deep architectures", http://arxiv.org/abs/1206.5533
- Y. LeCun et al, "Efficient BackProp", http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
- I. Sutskever, "A Brief Overview of Deep Learning ", http://yyue.blogspot.com/2015/01/a-brief-overview-of-deep-learning.html
- T. M. Breuel, "The Effects of Hyperparameters on SGD Training of Neural Networks", http://arxiv.org/abs/1508.02788
Thanks for the information. However I was interested in how to set up the structure of CNN here: https://github.com/rasmusbergpalm/DeepLearnToolbox/blob/master/tests/test_example_CNN.m#L15-L21
I would start with some well-known architecture. CIFAR-10 examples are good start, if your images are not too big. Otherwise AlexNet, but AlexNet is way too big for DeepLearnToolbox to handle.
For example CIFAR-10 network in Caffe examples has worked well for me: https://github.com/BVLC/caffe/blob/master/examples/cifar10/cifar10_quick_train_test.prototxt Hopefully you can figure out the layer parameters from all this prototxt cruft.
I found this formula in Andrej Karpathy's CNN course and it worked for me: (it's really simple after a while of thinking)
It assumes square images, vertical stride equals horizontal stride and a square kernel_size!
in_channels = 3 # nearly always, because image has 3 channels (3 matrices -> red, green, blue)
out_channels = (image_width - kernel_size + 2*padding) / stride + 1
# if you don't know what these variables mean, google it -> these are the basics of CNN
in_channels
and out_channels
are the parameters for one convolution layer, but each following layer's in_channels
equals to number of out_channels
from the previous one.