DeepPose Question about the output layer

Question about the output layer

Open MaxChu719 opened this issue 8 years ago • 0 comments

I am trying to re-implement it using MatConvNet.
I use the alexnet but drop all lrn and dropout (i heard that dropout is not good to be used in MSE loss function for regression, see http://cs231n.github.io/neural-networks-2/) and replace it with batch normalisation.
The output vector is the normalized joints vector (the x and y coordinates are from 0 to 1 w.r.t. the image.)
The dataset i am using is llsp and lspet (10000 testing samples, 1000 validating samples and 1000 testing samples)
The input sample will be all cropped around the ground truth joints location and data augmentation of horizontal flip and random translations are applied.
The loss function i used is MSE right after the last fully connected layer. But turns out the net is not learning. It will dropped into a certain level and then plateau for a very long time with no improvement during training (with learning rate 0.0001, higher than this learning rate will make the loss function explode). I have tried many learning rate and L2 weight decay and the loss function either explode or plateau very quickly.
But when i added the sigmoid in the last layer before applying the MSE loss function, it seems working fine now (i am still waiting it to finish)
My question is that i don't see you have apply sigmoid function (i guess so but i am not very familiar with chainer) before the loss function, but you can still train it probably so i am asking if there is something i done wrong.

Oct 09 '16 23:10 MaxChu719