lingvo
lingvo copied to clipboard
distillation loss does not decrease during training
Dear all,
I am using knowledge distillation training for ASR with lingvo. However, the distillation_loss (cross entropy between the teacher and the student) increases rather than decrease. I am confused for this. Do you observe this phenomenon? The loss is decreased when I use pytorch.