DeepAlignmentNetwork
DeepAlignmentNetwork copied to clipboard
Why no weight decay?
i have a question about the code. Generally, when training the neural network, L2 normalization or the so called weight decay will be added to the total loss. but in your code and the zjj implementation on Tensorflow, it seems like the loss is only the distance between the labels and prediction. do i understand the code wrong? or i miss the weight decay factor? i did not found any discuession in your paper. i am curious that if you add weight decay to your loss. if not, why not to add weight decay? or is this a trick on training network on regression problem? thank you for your reply.
Hi,
You are correct, there is no weight decay in DAN. If I remember correctly, I tried using it, and it did not improve the accuracy. I also conducted some test on landmark stability (jitter) and found that adding weight decay decreases the jitter, but so does early stopping.
Best regards,
Marek
ok~thanks very much again~