knowledge-distillation-keras
knowledge-distillation-keras copied to clipboard
correction for soft targets loss may be needed
Quoting original paper (block 2):
Since the magnitudes of the gradients produced by the soft targets scale as 1/T2 it is important to multiply them by T2 when using both hard and soft targets.
It looks like this correction is not included in your knowledge_distillation_loss
.
Quoting original paper (block 2):
Since the magnitudes of the gradients produced by the soft targets scale as 1/T2 it is important to multiply them by T2 when using both hard and soft targets.
It looks like this correction is not included in your
knowledge_distillation_loss
.
@arsenyinfo Do you know the place where correction is needed?