Teacher-free-Knowledge-Distillation
Teacher-free-Knowledge-Distillation copied to clipboard
Question about the loss function of Tf-reg KD
Hi, thank you for sharing such an awesome project.
For the TF-reg KD, in line 47 of my_loss_function.py, should we also divide the temperature T on the output variable, like:
loss_soft_regu = nn.KLDivLoss()(F.log_softmax(outputs / T, dim=1), F.softmax(teacher_soft/T, dim=1))*params.multiplier
As in Eq (9) of your paper, the loss function is $$D_{KL}(p^d_\tau, p_\tau)$$.
I would really appreciate it if you could help me. Look forward to your reply, thanks!
I am also very confused about this issue, looking forward to the author's answer #19 answer your question