student-teacher-anomaly-detection
student-teacher-anomaly-detection copied to clipboard
Student training got a very high loss
Hello, thanks for your code. When i ran the student_training.py code, I got a very high training loss, is it normal? Did it happen when you did the training for carpet-65 student network? I am looking forward to your reply, thanks again!
Hi @cwzzzzz, thanks for your interest ! In student_training.py, the loss is a computation of the MSE averaged over all the pixels (and batches) but not over the vector's size itself. It means that, since the descriptive vector of each pixel has a size of 512, we expect a high value for this loss. You could divide the value of the student_loss by 512 to normalize it but this should have not effect on how the gradients evolve. The important thing is how well the loss decreased relatively to its initial value so to say.